DATA Step, Macro, Functions and more

One big data set or multiple data sets

Regular Contributor
Posts: 199

One big data set or multiple data sets


I want to ask a question about efficiency.

In my work every month a data set (raw data) of Loans balances is created.

What  is more efficiency way:

option1- Every month create another raw data data set 

option2- Every month  add new rows  to the previous month data set.

What do you think is a better way to do it?


You need to know that lots of queries will be written based on these raw data tables.





Super User
Posts: 10,570

Re: One big data set or multiple data sets

If your queries/reports are mainly run for single months, it makes sense (performancewise) to keep the datasets separate.


Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Super User
Posts: 9,840

Re: One big data set or multiple data sets

Splitting same data is rarely a good idea.  Think of it this way, if you have multiple datasets split by date you:

1) have lots more files with their own header block - takes more space

2) have to write messy code to process lots of different dataset names - takes more time an resources to do so and is less stable

3) have code which is opening, closing, writing etc. data lots of times, inefficient for processing power


Hence the conclusion I would come to is no, don't split the data out into sections, have one single table with a fixed simple structure which is very easy to program against, query etc.  Just look at how the big databases do it, Oracle has a few tables with simple structures to hold lots of data.


As a bit of a disclaimer, this is general, if your looking at something more specific like big data, or disparate data then there are other models for such a thing.  But in general, for most tasks, keep it simple, keep it generic.

Ask a Question
Discussion stats
  • 2 replies
  • 1 like
  • 3 in conversation