SSIS-F1: July 2014

Sunday, 20 July 2014

SSIS 2008 - Sort Transformation

SORT Transformation:

Name itself tells us the use of this Transformation. This transformation requires one input and provides one output. It does not support ‘Error Routing’. Along with sorting the data, this transformation also provides the option to ‘Remove duplicate records based on the sort keys’. Based on the options selected, it can behave as a Synchronous or Asynchronous transformation.

As a Synchronous transformation: If this transformation is configured only to sort the data, then it will behave as a Synchronous transformation, whereby not changing the number of records it read and provided as output.

As a Asynchronous transformation: Along with sorting the data, if this transformation is configured to ‘Remove duplicate records based on Sort key’, then this will behave as a Asynchronous transformation, whereby the number of records given as output will be less than what it read (if there are any duplicates).

Below example will walk you through the various options available in this transformation:

Sample data:

Col_A	Col_B	Col_C
1	600	A
1	100	a
2	200	b
3	300	c
4	400	d
5	500	e
2	700	B
3	800	c
4	900	d
5	1000	e

1. Sorting data using key column having Numeric data type:
For this we will use Col_A from the sample data, which is having Numeric data type. Below shown is the Sort transformation editor screenshot configured to use Col_A as the ‘Sort Key’:

Output:

2. Sorting data using key column having String datatype:
For this we will use Col_C from the sample data, which is having Numeric data type. Below shown is the Sort transformation editor screenshot configured to use Col_C as the ‘Sort Key’:

Output:

Using String datatype as SORT key makes one more option available in the editor named ‘Comparison Flag’. Below screenshot shows the various options available in it:

From the available options we will use ‘Ignore Case’.
Output:

If you compare this output with the earlier output, you will see the difference:
Sample data:

Output without ‘Ignore Case’:

Output with ‘Ignore Case’:

3. Eliminating duplicates:For this we will use Col_A from the sample data. Below shown is the Sort transformation editor screenshot configured to use Col_A as the ‘Sort Key’ and option ‘Remove rows with duplicate sort values’ been checked:

Output:

SSIS 2008 - Derived Column Transformation

Derived Column Transformation:

As per the name whenever we want to derive a new column in our DFT based on an existing column or a static value or based on variable value, derived column transformation can be used. This transformation requires one input and provides only one output. It’s a synchronous transformation which supports Error routing.
Below mentioned example explains in more detail various features available in derived column transformation:

Sample data:

SalesRep_Id	Month	Year	Region	Country_Code	Unit_Sold
1	Jan	2011	Asia	IN	100
2	Feb	2011	EU	UK	150
3	March	2011	US	US	175
4	Jan	2012	Asia	IND	150
5	Feb	2012	EU	UK	200
6	March	2012	US	US	100
7	Jan	2013	Asia	IN	250
8	Feb	2013	EU	uk	300
9	March	2013	EU	UK	350

Before inserting the above sample data into the target,

1 1. We will check whether the country_code is of length 2, if not then it will be redirected to error output and those with length 2 will be converted to uppercase, if they are not.

2. We will add a new column “Insert_dt” using in-built Date-Time function “GETDATE()”.

As shown in the below screenshot, above sample data is been stored in a excel file. Excel Source is used to read the data which is been mapped to Derived Column Transformation ‘Der_Transform’:

To implement the first scenario, in the derived column transformation editor, under ‘Derived Column Name’ type in ‘Der_Cntry_Code’, in ‘Derived Column’ select ‘<add as new column>’, under expression type in “(DT_STR,2,1252)(LEN(Country_Code) <= 2 ? (DT_STR,2,1252)(Country_Code == UPPER(Country_Code) ? Country_Code : UPPER(Country_Code)) : Country_Code)”. In this expression, we are checking whether the length of input Country_Code is less than equal to 2 or not, if it is then, will check if it is in Upper case or not, if it is then the input column will be send ahead as it is, if not then it will be converted to Upper case using ‘UPPER()’ function and if the length is more than 2 then it will send unchanged.

If the incoming data is having a length more than 2, then in that case above expression will error out, resulting into its failure. To avoid this, we will ‘Configure Error Output’. From the above screen shot click on ‘Configure Error Output’. It will open ‘Configure Error Output Editor’. As shown in the below screenshot under ‘Truncation’ select ‘Redirect Row’.

This will cause the given input record route to Error Output, if its length is more than 2. Now will we map the output of the Der_Transform to two derived column transformation Valid_Country_code and other Invalid_Country_Code, Green arrow will be mapped to Valid_Country_code and Red arrow (Error handling) to Invalid_Country_Code.

Now to implement the second scenario, we will again open the Der_Transform. As shown in the below screenshot we will name the new column as ‘Der_Insert_DT’. In Derived Column ‘<add as new column>’ will be selected. Under Expression in-built function ‘GETDATE()’ will be used which can be found under ‘Date/Time Functions.’

Output:

Thursday, 17 July 2014

SSIS 2008 - Multicast Transformation

Multicast Transformation:

Simplest transformation of SSIS. As per the name Multicast transformation creates multiple copies of provided dataset. This transformation does not support Error Routing. This transformation has one input and multiple outputs.

Below screen shot shows, multicast transformation routing all 9 source records to all the 5 flows:

Difference between Conditional Split and Multicast:

Conditional split, routes the data to different flows based on the Condition specified, if none of the condition matches, then in that case it routes that particular row to default output. Due to this a given record gets routed to only one flow. Whereas in case of multicast, we don’t specify any condition which results in routing a record to all available outputs.

Uses of Multicast Transformation:
Let's say we have a DFT which reads the data from Staging table and loads it into datawarehouse performing a 'Type 2 SCD', which has a custom logging been implemented in which we are tracking how records got inserted(new records) and how many got updated. For this DFT, in insert flow just before destination a multicast can be placed which will route one flow to destination for insertion, and other flow can be mapped to Row Count transformation which we will assign that count to a pre-defined variable which can further be used for inserting that row count value in logging table. Same logic can be implemented for Update flow.

Wednesday, 16 July 2014

SSIS 2008 - Conditional Split Transformation

Conditional split Transformation

As per the name this transformation splits the given dataset into different subsets based on the provided condition. This transformation has one input and provides more than one output. Supports ‘Error Routing’.

This transformation provides default output as well which is similar to else statement in a given 'if-elseif-elseif-else' statement of programming languages. Conditions entered should evaluate to Boolean 'TRUE' or 'FALSE'. Every output can be given its own custom 'Output Name' which can help us to identify which output needs to be mapped to which flow.

The most important point to be remembered is 'order of the conditions is most important as a given row will get routed to the output flow for which the condition evaluates TRUE and won't be evaluated against remaining conditions'.

Below example shows how to configure Conditional split transformation and the importance of having correct order for Conditions.

Sample data:

SalesRep_Id	Month	Year	Region	Country_Code	Unit_Sold
1	Jan	2011	Asia	IN	100
2	Feb	2011	EU	UK	150
3	March	2011	US	US	175
4	Jan	2012	Asia	IN	150
5	Feb	2012	EU	UK	200
6	March	2012	US	US	100
7	Jan	2013	Asia	IN	250
8	Feb	2013	EU	UK	300
9	March	2013	EU	UK	350

Conditional Split Transformation editor:

Based on the above order of conditions, the output will be as shown below:

Now if you check the sample data again, you will find that there are 6 records which have ‘sales <=200’, but in the conditions sequence, this condition is placed at the end, due to which these 6 records get routed to their respective country flow. To fix this we need to change the conditions order. To do this we can use the arrow buttons placed next to the conditions. New Corrected conditions sequence is as shown below:

Output:

To consolidate, Conditional Split transformation can be used to route the rows as well to filter out unwanted rows in the middle of the data flow.