BookmarkSubscribeRSS Feed
rcw68
Calcite | Level 5

Hi - I currently have a dimension which has ~60,000 rows and 350 variables. The business want to start tracking history and to turn this dimension into a SCD - so I asked them which variables they want to track and they came back with a list of approx 30 variables - my feeling is that this is too many and will have a significant affect on performance but I have nothing to back that up.  Has anyone done any analysis or know if SAS recommend the maximum amount of tracked columns will work efficiently.

I tried to do some searching but have not been able to find anything apart from "The number and length of these columns affects the run-time performance of the job" - so I know it will affect performance but the business want to know what the maximum is so they can work on reducing their list.

 

Any advice would be appreciated.

Thanks

Rachel

3 REPLIES 3
LinusH
Tourmaline | Level 20
Since the dimension has quite few rows, I don't think that 30 columns is too much, at least from a performance view.
But one can ask why do you have a dimension with 350 cols? What user can grasp this amount of information? For me, it sounds like you have some data modelling tasks comming up.
Data never sleeps
Patrick
Opal | Level 21

It's only 60T rows so I wouldn't be too worried about performance for creating a digest value for thirty type 2 columns.

 

Where I have some concerns: If you're using the OOTB SCD Type 2 loader then the other 59970 columns are type 1 and the code generated will also create a digest value for all these columns. I'm not sure if the SCD Type 2 loader has been implemented to deal nicely with such a vast number of columns. Make sure that you check the log for truncation warnings and the like. Eventually consider to implement this as a custom transformation with your own code.

 

As @LinusH wrote: May be some re-modelling is required.

 

Are your business users aware of the fact that they will have to change all their queries adding a selection on "change_current_ind='Y'" or something similar?

 

Do you know what your business users want this history records for? May-be maintaining a separate history table or a change event table would serve the purpose as well.

LinusH
Tourmaline | Level 20

Elaborating a bit more.

Business stakeholders are seldom interested in history records, especially when it comes to labels/names.

Sounds like you don't have a detail data store - which is the natural place to keep a complete history record.

Then you can more easily design and create data marts (dimensions and facts) that business requires at the moment. This will hopefully lead to smaller and more easy to use data patterns (and not so much historization, and fewer columns).

 

Btw, by not track changes as Type 2, it does not necessary mean classifying them as type 1, it just means that they are only updated when any type 2 columns for the row has new value.

Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 885 views
  • 0 likes
  • 3 in conversation