BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
master_kim
Fluorite | Level 6

Hello. 

I have multiple datasets whose name has the following format: data_yyyymm.  For example, the names of the datasets are data_202211, data_202212, data_202301, and so on.  They all have same variables with year and month indicators in each dataset. I want to append these data so that I would have one master dataset that have all these datasets. I was thinking of using the following commands to work:

 

data masterdataset;
set data_202211;
do i = 202212  to 202307;
      some kinds of commands for append
end;
run;

I was not sure how I could make this code work so that I can have one master dataset. 

Could anyone give me some suggestion? Thank you in advance for your time and help!

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

What about shortcut lists instead?

 

data master;
set data_202211-data_202212 data_2023: ;
run;

View solution in original post

3 REPLIES 3
Reeza
Super User

What about shortcut lists instead?

 

data master;
set data_202211-data_202212 data_2023: ;
run;
master_kim
Fluorite | Level 6

That is an amazing alternative and it worked perfectly!

Thank you very much for your help!

mkeintz
PROC Star

@Reeza's reply is compact and effective:

 


What about shortcut lists instead?

 

data master;
set data_202211-data_202212 data_2023: ;
run;

But if this compact SET statement implies reading of a large number of datasets, just be aware that computer memory is allocated to build a buffer for each dataset.

 

I have dealt with this situation when working with daily TAQ (trade and/or quote) datasets from the us stock market, with names constructed with the pattern CTyyyymmdd.  If one were reading all the data for, say, June and July (i.e. SET CT202306: CT202307:), it would require 42 such buffers - one for each trading day.   And you can be sure many researchers may not want to stop at analyzing two months of data.  

 

The upshot is that this can become costly to performance.

 

In certain cases, there is an escape:

 

If all those datasets had the same variables, then use of the statement

 

SET CT202306: CT202307:    open=defer;

helps a lot.  The "open=defer" option tells SAS to defer opening the second dataset until the first is fully read. and apply the same principle to each of the datasets.  That is, it tells SAS to re-use the same buffer for each of the 42 datasets.

 

 

Even if they have different variables, there is a trivial workaround: create an empty dummy dataset with all the variables (but no observations), and then list it first in the SET statement:

data dummy;
  set ct202306: ct202307: ;
  stop;
run;

data new;
  set dummy ct202306: ct202307: open=defer;
  ....
run;

 

BUT (you knew another "but" was coming, right?) it still doesn't allow OPEN=DEFER in the presence of a BY statement.  Consider this code (assume each dataset has one DATE value, but is sorted by STOCK_TICKER),

 

set CT202306: CT202307: open=defer;
by stock_ticker date;

This can't reduce the number of buffers, because SAS has to have all 42 datasets open in order to produce the interleaved sequencing demanded by the BY statement.  In fact, it won't run at all, but will generate

 

 

ERROR: OPEN=DEFER is incompatible with the BY statement.
NOTE: The SAS System stopped processing this step because of errors.

in the log.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 923 views
  • 7 likes
  • 3 in conversation