Merge Join with non-equality operator in SSIS

In TSQL, it is common to encounter scenarios where a join is done between 2-3 data sources and usually done using CTE’s, sub-queries or temptables to do a date range lookups using BETWEEN operator when a baseline date is between min and max dates from different source.

1
2
3
SELECT c.id, o.value
FROM customer c inner join offer o
on c.registereddate BETWEEN o.begindate AND o.enddate

Now it becomes another challenge if both the sources are NOT on the same server or cannot be joined by the linked server because of security restrictions in the environment.

Now if the same problem has to be solved in SSIS, here are few options :

1) sort the sources by dates, merge join, conditional split for date ranges.

2) sort the sources by dates, merge join, script transform for date ranges.

3) use lookup with partial cache (it will have performance impact for a larger dataset) – also called as range lookup.

Setting IsSorted true

IsSorted SSIS

SortKey Position 1

SortKey Position 1

 

Limitations :

1) Please remember that merge join is a blocking transformation (in memory activity) because it has SORT attached to it.

2) MaxBuffersPerInput needs to be tweaked depending on the number of sources that are inputs for the join.

Advantages :

1) The output of merge join also comes out as sorted so subsequent operations can use another merge join in the downstream workflow.

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>