Geoff's Business Intelligence Blog: Simplified Modulo in SSIS

OK, last modulo post was rather meandering and overly complex for most people to use. I wrote it mostly for my reference in the future to jog my memory and didn't have the time to go into great detail on how to implement that solution.

Next time I design a large Slowly Changing Dimension I may post it up step by step in greater detail.

With that in mind I thought I'd post a simpler to understand and implement example.

The Algorithm works as follows:

instead of this -

select * from MyLargeTable

Use this query -

select * from MyLargeTable
where (ID %3) = 0
OPTION (MAXDOP 1)

The '3' after the percentage sign dictates how many sections to split the table into and the '= 0' dictates which selection we are selecting. It is a zero-based index.

A simple implementation in SSIS looks like this:

Each OLE DB Source contains the same SQL query as before, but each one uses a different part of the zero-based index (the clue is in the name of the OLE DB Source!).

So there we are - a really simple solution I quarantee will GREATLY improve the speed of large SQL imports, and will also improve the performance of a large dimension when implemented in a bit more context than I had time to write.

N.B. The 'OPTION (MAXDOP 1)' part does something with the processes I can't remember, but I think it's related to running asynchronously and improves performance. Look it up.

Geoff's Business Intelligence Blog

Wednesday, 9 January 2013

Simplified Modulo in SSIS

No comments:

Post a Comment