SAS Programming

master_kim · Posted 08-09-2023 11:34 AM

Hello.

I have multiple datasets whose name has the following format: data_yyyymm. For example, the names of the datasets are data_202211, data_202212, data_202301, and so on. They all have same variables with year and month indicators in each dataset. I want to append these data so that I would have one master dataset that have all these datasets. I was thinking of using the following commands to work:

data masterdataset;
set data_202211;
do i = 202212  to 202307;
      some kinds of commands for append
end;
run;

I was not sure how I could make this code work so that I can have one master dataset.

Could anyone give me some suggestion? Thank you in advance for your time and help!

Reeza · Posted 08-09-2023 11:38 AM

What about shortcut lists instead?

data master;
set data_202211-data_202212 data_2023: ;
run;

View solution in original post

Reeza · Posted 08-09-2023 11:38 AM

What about shortcut lists instead?

data master;
set data_202211-data_202212 data_2023: ;
run;

master_kim · Posted 08-09-2023 12:42 PM

That is an amazing alternative and it worked perfectly!

Thank you very much for your help!

mkeintz · Posted 08-09-2023 12:45 PM

@Reeza's reply is compact and effective:

What about shortcut lists instead?
data master;
set data_202211-data_202212 data_2023: ;
run;

But if this compact SET statement implies reading of a large number of datasets, just be aware that computer memory is allocated to build a buffer for each dataset.

I have dealt with this situation when working with daily TAQ (trade and/or quote) datasets from the us stock market, with names constructed with the pattern CTyyyymmdd. If one were reading all the data for, say, June and July (i.e. SET CT202306: CT202307:), it would require 42 such buffers - one for each trading day. And you can be sure many researchers may not want to stop at analyzing two months of data.

The upshot is that this can become costly to performance.

In certain cases, there is an escape:

If all those datasets had the same variables, then use of the statement

SET CT202306: CT202307:    open=defer;

helps a lot. The "open=defer" option tells SAS to defer opening the second dataset until the first is fully read. and apply the same principle to each of the datasets. That is, it tells SAS to re-use the same buffer for each of the 42 datasets.

Even if they have different variables, there is a trivial workaround: create an empty dummy dataset with all the variables (but no observations), and then list it first in the SET statement:

data dummy;
  set ct202306: ct202307: ;
  stop;
run;

data new;
  set dummy ct202306: ct202307: open=defer;
  ....
run;

BUT (you knew another "but" was coming, right?) it still doesn't allow OPEN=DEFER in the presence of a BY statement. Consider this code (assume each dataset has one DATE value, but is sorted by STOCK_TICKER),

set CT202306: CT202307: open=defer;
by stock_ticker date;

This can't reduce the number of buffers, because SAS has to have all 42 datasets open in order to produce the interleaved sequencing demanded by the BY statement. In fact, it won't run at all, but will generate

ERROR: OPEN=DEFER is incompatible with the BY statement.
NOTE: The SAS System stopped processing this step because of errors.

in the log.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Programming

Writing a loop to append multiple datasets

Re: Writing a loop to append multiple datasets

Re: Writing a loop to append multiple datasets

Re: Writing a loop to append multiple datasets

Re: Writing a loop to append multiple datasets

proc append resulting in abnormally large dataset

Appending dataset in Macro loop

Append into single dataset

PROC APPEND Alternatives

Macros to loop and append multiple files with year-month name

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...