Hello
I want to ask a question about efficiency.
In my work every month a data set (raw data) of Loans balances is created.
What is more efficiency way:
option1- Every month create another raw data data set
option2- Every month add new rows to the previous month data set.
What do you think is a better way to do it?
Why?
You need to know that lots of queries will be written based on these raw data tables.
Thanks
Joe
If your queries/reports are mainly run for single months, it makes sense (performancewise) to keep the datasets separate.
Splitting same data is rarely a good idea. Think of it this way, if you have multiple datasets split by date you:
1) have lots more files with their own header block - takes more space
2) have to write messy code to process lots of different dataset names - takes more time an resources to do so and is less stable
3) have code which is opening, closing, writing etc. data lots of times, inefficient for processing power
Hence the conclusion I would come to is no, don't split the data out into sections, have one single table with a fixed simple structure which is very easy to program against, query etc. Just look at how the big databases do it, Oracle has a few tables with simple structures to hold lots of data.
As a bit of a disclaimer, this is general, if your looking at something more specific like big data, or disparate data then there are other models for such a thing. But in general, for most tasks, keep it simple, keep it generic.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.