Recently I created SSIS packages to handle information from a spread sheet for a client. I finished making the packages, only to later realize there was a problem. One SSIS package added two tables to my database in order to handle changing the incorrectly typed value to the correct value in a specific column on any spread sheet. Lets use the following spread sheets to better describe exactly what I mean.The following tables in the database are as follows:
The SSIS package to move the data from table one into the database looks like this:
The Move CSV to StandardName Data Flow
The Flat File source “CSV file” extracts the data from a file that contains Table One’s data. The Sort transformation sorts the data by the StandardName column and deletes any duplicate rows that have the same value in the StandardName column. The OLE DB source “StandardName” moves the sorted StandardName column values into the db to make picture C above.
The data flow task in picture C “Move CSV to StandardName Lookup” works in the following way:
Again, the Flat File source “CSV file” extracts the data from a file that contains Table One’s data. The lookup transformation finds the number in the ID column in the PracticeName table in the database and sets it into a new column called Lookup ID. The “StandNameLookup” OLE DB destination puts the information in the PracticeNameLookup table, which is shown in picture E above.
There were a couple issues with importing the Table One data to the database:
- What if a column had a blank value? This could map any data in any table to be either a blank value, even when some value was in the cell. Or it could make a blank cell match to a value when nothing should be in that cell.
- What if the data in a spread sheet cell had the correct name, but there wasn’t a correctly listed name in the StandardName column and in the ListedName column? This means that the entered name and the listed name do not match.
There needed to be something to ensure that these two problems were tested for before moving the Table One data into the database tables.
First, add a variable for the package, set the variable name and set the data type to Object. Pictured below is an example of this with the variable name ObjectArray.
Then update the SSIS package:
The flat file source “CSV file” extracts the data from the CSV file containing the Table One data. The record set destination moves the data from the CSV file to the object array by setting the variable name (image H) and the input columns (image G) in the record set destination settings.
The script task “Check Data” tests the data in the object variable set to see if there are any missing values in cells, see if there is one row with the same value in the StandardName and ListedName column, and to make sure there is no duplicate values in the ListedName column. The script task does this in the following way:
Open the script task. Set the ‘ReadOnlyVariables’ to the ‘ObjectArrayVariable’ (Image I). Then click on the edit script button. Inside the main function the following happens:
- Set the variables (image J). The variables are for: the emailing information, finding what row of data is being dealt with, and keeping a list of the ListedName column values that have already been tested
- A for each loop (Image K) loops through each row in the data set.
- First, variables are set for the data in the StandardName row and the ListedName row.
- An ‘If’ statement checks to see if any of those values are blank. If there is a blank value, an email is sent showing what row of data has a blank value. The package is then forced to fail with the code (Dts.TaskResult =(int)ScriptResults.Failure)
- An ‘If’ statement then checks to see the row value for the ListedName column was already in a previous row
- The ‘onRow’ variable is incremented by one
- The listed variable is added to the listedData variable
This update to the SSIS package fixes the problems by making the package fail if the script task fails. So the data from Table One wont be moved into the database. It also does one better by sending an email to a set email address letting the recipient know what rows of data contain errors.
What Was Learned:
- Two tasks, a data flow task and script task, should be put before whatever tasks the SSIS package would originally do.
- A package variable with a data type of Object should be set to record the data in the table.
- A data flow task can be used to move the CSV table data into the variable that was set.
- A script task can be used to loop through each row and check if there is something that should not be there.
- A script task can be used to fail the package and send information about the failure to any email that is wanted.
Before making SSIS packages, a developer must take into account all possible scenarios that could ruin the data. Failure to do so may cause unforeseen issues to arise throughout the process.