Cache Transformation:
A new feature added in SSIS 2008. This transformation creates a Cache file
which can be used in Lookup Transformation whereby reducing the time &
memory overload to create a Cache file during execution.
If wisely used this transformation can be used to perform lookup on a flat
file data without storing it in a database table.
To prove this statement, let’s take an Employee – Department example, in which we will generate a flat file which will hold employee id, employee name and its department id and name. In this, employee details are stored in database table, whereas Department details are stored in a flat file.
Employee data present in Database table:

Department data present in flat file:

First we will create a Cache file for Department flat file, which we will be used in the lookup:
1. Configure the flat file source to read the Department flat file:
As shown in the below screen shot, drag ‘Flat file source transform’ and map the department flat file to it:

2. Configure the Cache Transformation:
Drag the “Cache Transform” from the Data Flow Transformations. Map Flat file Source to ‘Cache Transform’. Open cache transform Editor to configure it.

Click on “New” to create “Cache Connection Manager” which will be used to create the Cache file.

In ‘General’ tab, type in ‘Connection Manager Name’, check ‘Use file cache’ checkbox and click on ‘Browse…’ select the location where you want to store the Cache file. In this case, we have given the Cache file name ‘lkp_Dept.caw’. “caw” is the Cache file extension.
Under ‘Columns’ tab, set ‘Index Position’ as 1 for column ‘Dept_id’. Index Position is sequential positive number. This number indicates the order in which lookup transformation compares rows in the reference dataset to rows in the input data source. The column with the most unique values should have lowest index position (Reference: http://msdn.microsoft.com/en-IN/library/bb895364.aspx)

After configuring the Cache Connection Manager, go back to Cache Transform Editor and click on ‘Mappings’ tab, as shown in the below screen shot map the source columns to its corresponding Cache file column.

3. Load Cache file:
Now we will execute the above created flow to load the cache file with the records present in the Department flat file.

Now if you check the lookup file path you will see Cache file created.

Now that we have created the cache file, we will create a Data flow to create the target file which will load the employee and its department information, department info will be looked up from above created Cache file ‘lkp_Dept.caw’.
As shown below, configure the ‘OLE DB Source’ to read the Employee data from the database table. This will be followed by ‘Lookup Transform’ which will be configured to read data from above created Cache file. Open the lookup transformation editor, in ‘General’ tab select ‘Connection Type’ as ‘Cache Connection Manager’

Under ‘Connection’ tab, select the Cache Connection manager, which was created in earlier DFT to create the cache file:

In ‘Columns’ tab, map Input Dept_Id column with lookup Dept_id:

Click 'OK'.
Now we will drag ‘Flat File Destination’, and map ‘Lookup Match Output’ to it. Create a new flat file connection, which we load the target file.

In mappings tab, map the source column to its corresponding target file column:

Click 'OK'.
Now we will execute the DFT to load the target file.

Data in Target file:

There are few things that about Cache Transformation to be remembered:
1. The Cache Transform writes only unique rows to the
Cache connection manager:
Let’s add a duplicate record in the Department flat file and re-run the
cache file creation data flow.
Now to view the cache file, we will create a new data flow using ‘Raw File Source’. As shown in the below screen shot, drag ‘Raw file Source’, under ‘connection manager’ select ‘Access mode’ as ‘File name’, click ‘Browse…’ select the above created cache file.

Now drag a Derived Column transformation, map Raw file output to it. Configure data viewer to view the Cache file:

Execute the data flow:

From the above screenshot you can see, department file has 4 rows, but the cache file holds only 3 rows, as the fourth row is duplicate, based on Dept_Id which is been assigned Index_Position 1, in cache transformation.
2. In a single package, only one Cache Transform can write data to the same
Cache connection manager. If the package contains multiple Cache Transforms, the
first Cache Transform that is called when the package runs, writes the data to
the connection manager. The write operations of subsequent Cache Transforms
fail.
To test this, we will create a copy of the data flow created earlier to
load the cache file. Now while executing the entire package
we can see the last DFT which a copy of first one gets error out. Leading to package failure.
Error:

To avoid this failure, we need to create a new Cache Connection for this DFT.
No comments:
Post a Comment