10-05-2015 08:49 AM
Hi - I currently have a dimension which has ~60,000 rows and 350 variables. The business want to start tracking history and to turn this dimension into a SCD - so I asked them which variables they want to track and they came back with a list of approx 30 variables - my feeling is that this is too many and will have a significant affect on performance but I have nothing to back that up. Has anyone done any analysis or know if SAS recommend the maximum amount of tracked columns will work efficiently.
I tried to do some searching but have not been able to find anything apart from "The number and length of these columns affects the run-time performance of the job" - so I know it will affect performance but the business want to know what the maximum is so they can work on reducing their list.
Any advice would be appreciated.
10-05-2015 11:28 AM
10-05-2015 07:11 PM
It's only 60T rows so I wouldn't be too worried about performance for creating a digest value for thirty type 2 columns.
Where I have some concerns: If you're using the OOTB SCD Type 2 loader then the other 59970 columns are type 1 and the code generated will also create a digest value for all these columns. I'm not sure if the SCD Type 2 loader has been implemented to deal nicely with such a vast number of columns. Make sure that you check the log for truncation warnings and the like. Eventually consider to implement this as a custom transformation with your own code.
As @LinusH wrote: May be some re-modelling is required.
Are your business users aware of the fact that they will have to change all their queries adding a selection on "change_current_ind='Y'" or something similar?
Do you know what your business users want this history records for? May-be maintaining a separate history table or a change event table would serve the purpose as well.
10-05-2015 08:18 PM
Elaborating a bit more.
Business stakeholders are seldom interested in history records, especially when it comes to labels/names.
Sounds like you don't have a detail data store - which is the natural place to keep a complete history record.
Then you can more easily design and create data marts (dimensions and facts) that business requires at the moment. This will hopefully lead to smaller and more easy to use data patterns (and not so much historization, and fewer columns).
Btw, by not track changes as Type 2, it does not necessary mean classifying them as type 1, it just means that they are only updated when any type 2 columns for the row has new value.