Monday, November 19, 2012

SQL Server - XML Schema and DTS to SSIS


I recently had a phone interview in which I was asked about importing XML schema's and converting SQL Server 2000 DTS packages to SSIS 2008 packages.  The interviewer had a need to import data from XML files and automatically generate the schema's and didn't have a lot of time to figure out how.  I had no recent experience on the subjects so I did some research. The answers to the interviews questions can be found in the following links and may end up being useful to me at some point now that I have done a little research. I'm sure there is still more complexity to deal with but basically it just involves using SQLXMLBulkLoad, setting the Bulk Load property to false so that no XML data is  loaded and seting the SchemaGen property to true so that the schema will be generated. The data could be loaded this way as well or BCP could be used to import the XML data. I found another article on how to Exporting a Large XML File
which involves creating a BCP format file for the data so I suspect you could use SQLXMLBulkLoad to create the schema and then BCP to create a format file to be used for importing and exporting the data.

DataSet Schema Inference Process



Summary of the DataSet Schema Inference Process - 1
Summary of the DataSet Schema Inference Process - 2

Guru's Guide to SQL Server Architecture and Internals, The Chapter 18

XML Bulk Load Examples (SQLXML 4.0) - Note: Search for SchemaGen property.
XML to Schema Wizard (Visual Basic)
Exporting a Large XML File

Examples of Bulk Importing and Exporting XML Documents







Another subject the interviewer was interested in was converting DTS packages to SSIS 2008.
There is an option are a few options for this described in the following links. 

DTS to SSIS 2008

Forum Discussion with links



Tuesday, October 30, 2012

SSRS - Centralized Formatting

Centralized formatting should really be done via .net DLLs but in many organizations short and long term maintenance of reporting is more often than not ignored and the idea of providing development software to build DLLs to people who merely do reporting discounted.  When this is the case SSRS report developers have a second option of using code in the reports to replace the functionality the .net DLL would have provided.  The formatting will not be centralized as it would in a DLL however it is possible to use search and replace on report RDLs to modify the code thereby simplifying the process of making the same changes to hundreds of reports. This makes using values defined in code the second best choice.  Make sure to make backups of the RDLs first as it is very possible to do something to many reports that will require reverting to the original version. In this situation it is advisable to use code something like show below to format your report titles, tables, fonts, etc.  This code goes in the code section of your report. Examples of calling the code are at the end of this article.  There are two  SQLServerCentral.com articles on creating DLLs for this purpose. This first is SSRS - Custom Code with External Assemblies and the second is Centralising Reporting Services Stylesheets. You can use these links along with the code below to create you own custom centralized formatting DLLs.  This Microsoft link on security may also help you get them working.

Public Const NumberFormat = "###,###,###,###,##0.00" 
Public Const PercentageFormat = "##0.00 %"    

  
Function BackgroundColor(ByVal Style As String) As String          
Select Case UCase(Style) 
            Case "MAINTITLE":Return "White"             Case "SUBTITLE1":Return "White"             Case "SUBTITLE2":Return "White"             Case "SUBTITLE3":Return "White" 
            Case "DATEGROUP":return "White" 
            Case "TABLEHEADER":Return "CornflowerBlue"             Case "TABLEFOOTER" :Return "CornflowerBlue" 
            Case "GROUP1HEADER":Return "#8fb3f3"             Case "GROUP1FOOTER":Return "#8fb3f3" 
            Case "GROUP2HEADER":Return "#c7d9f9"             Case "GROUP2FOOTER":Return "#c7d9f9" 
             Case "SUBTOTAL":Return "#8fb3f3" 
            Case "GRANDTOTAL":Return "White" 
            Case "DETAIL": Return "White"             Case "PAGEFOOTER": Return "White"             Case "REPORTFOOTER": Return "White"             Case Else: Return "White"         End Select End Function 
  
Function FontColor(ByVal BackGroundStyle As String) As String 
Select Case UCase(BackGroundStyle )   Case "PURPLE":Return "Black"   Case "DARKBLUE":Return "Black"   Case "WHITE":Return "Black"   Case "LIGHTSTEELBLUE":Return "Black"   Case "LIGHTGREY":Return "Black"   Case "LightGrey":Return "Black"   Case "#6e9eca":Return "Black"   Case "#e0e0e0":Return "Black"   Case Else:Return "Black"  End Select End Function 
Function FontFamily(ByVal Style As String) As String 
        Select Case UCase(Style)             Case "MAINTITLE":Return "Arial"             Case "SUBTITLE1":Return "Arial"             Case "SUBTITLE2":Return "Arial"             Case "SUBTITLE3":Return "Arial"             Case "TABLEHEADER":Return "Arial Narrow"             Case "TABLEFOOTER" :Return "Arial Narrow"             Case "GROUP1HEADER":Return "Arial Narrow"             Case "GROUP1FOOTER":Return "Arial Narrow"             Case "GROUP2HEADER":Return "Arial Narrow"             Case "GROUP2FOOTER":Return "Arial Narrow"              Case "DATEGROUP":return "Arial Narrow"             Case "SUBTOTAL":Return "Arial Narrow"             Case "GRANDTOTAL":Return "Arial Narrow"             Case "DETAIL": Return "Arial Narrow"             Case "PAGEFOOTER": Return "Arial Narrow"             Case "REPORTFOOTER": Return "Arial Narrow"             Case Else: Return "Arial Narrow"         End Select End Function 
Function FontSize(ByVal Style As String) As String 
        Select Case UCase(Style)             Case "MAINTITLE":Return "12pt"             Case "SUBTITLE1":Return "12pt"             Case "SUBTITLE2":Return "12pt"             Case "SUBTITLE3":Return "10pt"             Case "TABLEHEADER":Return "8pt"             Case "TABLEFOOTER" :Return "8pt"             Case "GROUP1HEADER":Return "8pt"             Case "GROUP1FOOTER":Return "8pt"             Case "GROUP2HEADER":Return "8pt"             Case "GROUP2FOOTER":Return "8pt"             Case "DATEGROUP":return "8pt"             Case "SUBTOTAL":Return "8pt"             Case "GRANDTOTAL":Return "8pt"             Case "DETAIL": Return "8pt"             Case "PAGEFOOTER": Return "8pt"             Case "REPORTFOOTER": Return "8pt"             Case Else: Return "8pt"         End Select End Function 
 Function FontWeight(ByVal Style As String) As String 
        Select Case UCase(Style)             Case "MAINTITLE":Return "Bold"             Case "SUBTITLE1":Return "Bold"             Case "SUBTITLE2":Return "Bold"             Case "SUBTITLE3":Return "Bold"             Case "TABLEHEADER":Return "Bold"             Case "TABLEFOOTER" :Return "Bold"             Case "GROUP1HEADER":Return "Normal"             Case "GROUP1FOOTER":Return "8pt"             Case "GROUP2HEADER":Return "8pt"             Case "GROUP2FOOTER":Return "Bold"              Case "DATEGROUP":return "Bold"             Case "SUBTOTAL":Return "Bold"             Case "GRANDTOTAL":Return "Bold"             Case "DETAIL": Return "Normal"             Case "PAGEFOOTER": Return "Normal"             Case "REPORTFOOTER": Return "Normal"             Case Else: Return "Normal"         End Select End Function 
 Function FontStyle(ByVal Style As String) As String 
        Select Case UCase(Style)             Case "MAINTITLE":Return "Normal"             Case "SUBTITLE1":Return "Normal"             Case "SUBTITLE2":Return "Normal"             Case "SUBTITLE3":Return "Normal"             Case "TABLEHEADER":Return "Normal"             Case "TABLEFOOTER" :Return "Normal"             Case "GROUP1HEADER":Return "Normal"             Case "GROUP1FOOTER":Return "Normal"             Case "GROUP2HEADER":Return "Normal"             Case "GROUP2FOOTER":Return "Normal"              Case "DATEGROUP":return "Normal"             Case "SUBTOTAL":Return "Normal"             Case "GRANDTOTAL":Return "Normal"             Case "DETAIL": Return "Normal"             Case "PAGEFOOTER": Return "Normal"             Case "REPORTFOOTER": Return "Normal"             Case Else: Return "Normal"         End Select End Function 
  
----------------------------------------------------------------------------

Examples of using this code in your report.

<Color>=code.FontColor(code.BackgroundColor("SUBTITLE1"))</Color> <BackgroundColor>=code.BackgroundColor("SUBTITLE1")</BackgroundColor> <FontStyle>=code.FontStyle("SUBTITLE1")</FontStyle> <FontFamily>=code.FontFamily("SUBTITLE1")</FontFamily> <FontSize>=code.FontSize("SUBTITLE1")</FontSize> <FontWeight>=code.FontWeight("SUBTITLE1")</FontWeight> 

<Color>=code.FontColor(code.BackgroundColor("SUBTITLE2"))</Color> <BackgroundColor>=code.BackgroundColor("SUBTITLE2")</BackgroundColor> <FontStyle>=code.FontStyle("SUBTITLE2")</FontStyle> <FontFamily>=code.FontFamily("SUBTITLE2")</FontFamily> <FontSize>=code.FontSize("SUBTITLE2")</FontSize> <FontWeight>=code.FontWeight("SUBTITLE2")</FontWeight> 
  


<FontStyle>=code.FontStyle("GROUP1FOOTER")</FontStyle>

<FontFamily>=code.FontFamily("GROUP1FOOTER")</FontFamily>
<FontSize>=code.FontSize("GROUP1FOOTER")</FontSize>
<FontWeight>=code.FontWeight("GROUP1FOOTER")</FontWeight>
<Format>=code.NumberFormat </Format>

Sunday, October 21, 2012

SSIS - Knowledge dump and links.

I've been developing with SQL Server for the past 20 years and have occasionally been tasked to create an SSIS package or to fix one that was not operating properly.  I have found the occasional use of the data transfer wizard very convenient however I usually avoid SSIS packages for a number of reasons I will list at the end of this article. That being said I work as a contract and my most recent positions have had be doing a lot of SSIS so I though I would part with a few of the SSIS basics helpful in my last few projects.

Before I start I almost forgot to add this tidbit. Many import projects require you to run in 32 bit mode so make sure you have it set and don't be supprised if you have to keep resetting it. When your package starts  start dying everywhere again go back and make sure it's not back 64 bit mode.  It's a bit annoying but I don't have a solution for it other than.

Error Handling:  In order to keep a package from stopping when it encounters an error with a data flow or some other object in your package you will need to create an event handler, an empty event handler will do.  First make sure your global variables are showing or available and then select the object or data flow you want to keep from stopping your package and set it's MaximumErrorCount property to 0.  Next select the Event Handlers tab and click on "Click here to create and OnError event handler ...". Next set the system variable "propagate" to false. That's it. Make sure you don't set the MaximumErrorCount property of the actual error handler to zero.

Global temp tables ##MyTempTable: I also wanted to use temp tables in my SQL Execute task objects.  In order to do this you need to set the connection manager property  RetainSameConnection to true and also set DelayValidation to true on all tasks that use the temp table. One thing I found that I didn't see mention anywhere else is that global temp tables are evidently only global within one data-flow path. For instance if you use two data-flow paths to parallel process some functionality the temp tables are only global to the data-flow path they are being executed in.  As far as I could tell you can't use global temp tables from two different data-flow paths together in the same Execute SQL task.. Global temp tables do not appear to be global to separate data-flow paths.

Excel Imports: One of the  projects I've worked on recently was importing Excel files into SQL Server, applying some business rules to the data and updating or inserting the data into some tables. This sounds pretty easy until you start finding out how the jet drivers decide how to import Excel data.
I read a myriad of articles explaining how to import the data and of coarse my project was importing macro enabled spreadsheets so it required using the ACE.OLEDB.12.0 driver and setting some  extended properties (Excel 12.0 Macro;HDR=Yes or Excel 12.0 Macro;HDR=No) If excel was not installed on the machine the Microsoft Access Database Engine 2010 Redistributable driver would need to be installed.

After scouring a lot of Internet articles and a lot of import testing I finally found a description of how the driver determines how it will import data. Please read the article for yourself at http://www.etl-tools.com/imex1.html and related information at Microsoft kb/257819. In short it sounds like Excel just isn't a good import export tool.  The only way to get reliable data importing from Excel is to use the first column to control how excel imports the data. You should also beware of Excel data imports because they were probably exported using the standard Excel driver and have probably  truncated some text at 255 characters.

So here is what you do to get you data imported from Excel.  First you meed to create a template with an example of the  file structure in the first row of the spreadsheet or you need to manually open the spreadsheet up and if necessary insert a row at the top or use the first row to provide an example data set. In my situation I wanted to import everything as text so I could import all the spreadsheets into the same data cleansing table and then deal with cleansing the data and  converting the data types  in SQL Server.  I suspect you can hide the first row and use it as your data description  but I didn't  have the need to try it as I had a and empty first row to use and I used blanks to indicate to Excel what to be imported.  The only thing I really had to do is make sure one of the columns was 4000 characters long so Excel would know it was long text and I wouldn't get meta data errors in SSIS when attempting to import it.

You also need to make sure the TypeGuessRows registry setting is set to 1 so it will use  the first row as the data import specification .I check it first and save the value so I can reset it to the original value when I'm done.  That way other people doing excel imports will not be effected by you changes except if the try to import data at the same time. The  service account running the DBEngine needs to be given privileges to write  to the registry so check the computer services or use the SQL Server Configuration Manager to find out what account is running the SQL Server DB Engine. Then if necessary use RegEdit to give permissions to the account. There is a menu option in RegEdit to give the permissions. Below  is some T-SQL code to read and write the registry values on 64 bit machines. 

----------------------------------------------------
-- Check for provider:Microsoft.ACE.OLEDB.12.0
-------------------------------------------------------------
SET @intRegValue = -1
exec @rc = master.dbo.xp_regread
 @rootkey='HKEY_LOCAL_MACHINE'
, @key='SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel\'
, @value_name= 'TypeGuessRows'
, @value = @intRegValue  output
IF @rc <> 0
OR @intRegValue = -1
BEGIN
SET @msg = 'Error reading Excel TypeGuessRows.  The Excel file import provider Microsoft.ACE.OLEDB.12.0 is probably not installed.'  + CHAR(13)
SET @msg = @msg  + 'You may need to install the Microsoft Access Database Engine 2010 Redistributable which can be found at the URL address ' + CHAR(13)
SET @msg = @msg  + 'http://www.microsoft.com/en-us/download/details.aspx?id=13255'
PRINT @msg
END

---------------------------------------------------------------
-- Set the TypeGuessRows registry value.
---------------------------------------------------------------
exec  @rc = xp_regwrite
@rootkey='HKEY_LOCAL_MACHINE'
, @key='SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel\'
, @value_name='TypeGuessRows'
, @type='REG_DWORD'
, @value=1

File System Operation

I tried using the SSIS File System objects to implement File System operations but I found C# to be a better option. A few things I needed to do were to  loop through directoriesread files from the disk in order of write date, read the file propertiesexecute stored procedures, perform some logging , setup logging, setup logging2, deleting all the files in a directory, etc.

Parallel processing is definitely an advantage in SSIS but aside that for a number of reasons whenever possible I prefer to use SQL Server TSQL and CLRs or at least call them from SSIS. Here are a few of the reasons.




  • I already have a TSQL code base of about 1000 procedures, 300 functions and some CLRs that provide me with  FileSystem and other functions and it's all setup pretty well for re-usability. This let's me develop reusable code and functionality such as data imports.
  • The code for SSIS packages can't just be printed out and reviewed like TSQL and C# code.  
  • The SSIS graphical interface is often painfully slow. Especially with complex packages.
  • SSIS often give less then helpful error messages making it difficult to see error messages readily displayed in TSQL or Oracle Developer. It does provide ways to see a ton of messages but unless configured carefully is more likely to produce information overload and make it hard to see the actual problems. 






  • Thursday, June 21, 2012

    Four Scales, 28 Modes: flat 3s and flat 6s

    There are 7 modes in a scale and on a guitar there are 3 different patterns for each scale. Learn all of them and you can play 90 percent of pop rock country music. There is also a Harmonic minor scale. It is an Arabic sounding scale that is also great for rock, Spanish and Mediterranean sounds. There is also a melodic minor scale that is used frequently in jazz. If we include the harmonic major scale our music improvisation theory lesson covers using the flat 3s and flat 6s of any scale.Add to that the whole half, half whole  diminished scales and the augmented scale and you have just about every possible pattern to play.  
    Learn to arpeggiate all 28 modes and you'll have serious upper hand in improvisation.

    Guitar Lessons

    Wednesday, April 18, 2012

    Natural Key Unique Record Constraint

    The single most important aspect of table design, view design and handling data in general is a real data/natural key unique record constraint/definition (URIX) which defines and enforces data granularity (what exists in the table) . If a natural key unique record constraint hasn't been defined then data aggregated from the table is meaningless. Additionally without a proper unique record constraint the table data could be duplicated or worse and no one might ever recognize duplicate data as what exactly duplicate data was had not been defined. SQL group by statements can be used to determine the natural key unique record constraint for a table. With traditional MS Access applications the only way to implement unique record constraints are through the use of primary keys(PK) or composite primary keys which increases the complexity of designing access applications and is seldom done. Referential integrity is normally setup using auto-number/surrogate keys as the PK which adds some confusion to the purpose of the Primary Key as it no longer serves as a unique record constraint. While using a surrogate key to implement referential integrity may be optional having a natural key unique record constraint is mandatory if you want to ensure data integrity. Fortunately in SQL Server the unique record constraint can be implemented as a unique index or unique composite index and the unique key used to configure referential integrity is the actual primary key. This allows developers to locate a specific record with a where clause specifying a single value and reduces the links between tables in schema diagrams. More complicated data constraints are implemented using table triggers such as allowing only one record in a defined set of records to be marked as the active record or primary record such as primary phone number or primary address.  Unfortunately this can only be done at the form level in MS Access. Implementing unique record constraints in MS Access web applications takes even another approach as the primary key has to be a single surrogate key and unique record constraints are implemented through the use of data macros similar to SQL Server triggers.

    As should be obvious by now the main goal of this blog article is to stress the importance of real data/natural key unique record constraints (URIX) . Unfortunately this relatively simple concept is often overlooked and is likely the main cause of garbage data.  The URIX is also actually also the most important aspect of joining tables in SQL statements/views. Since the URIX defines the data contained in a table when two tables are joined (foreign key(FK) to primary key(PK))  to create a view AND the URIX/PK of the second table  is not fully contained in the URIX of the first table then the URIX of the query or view ends up being a combination of the URIX of the first table plus any new columns from the PK/UIRX of the second table. Assuming the second table is joined using a left outer join the record count of the view will now reflect the record count of the first table plus the count of matching records in the second table grouped on the linked columns having COUNT(*) > 1 greater than one. In short URIX is the single most important aspect of both tables and SQL statements and if not understood and used correctly the cause of row duplication in SQL statements.

    In short URIX is the single most important aspect of both tables and SQL statements and if not understood and used correctly the cause of row duplication in SQL statements.

    In short URIX is the single most important aspect of both tables and SQL statements and if not understood and used correctly the cause of row duplication in SQL statements.
           
    The following articles contain related information