BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Fluorite | Level 6


I have multiple datasets whose name has the following format: data_yyyymm.  For example, the names of the datasets are data_202211, data_202212, data_202301, and so on.  They all have same variables with year and month indicators in each dataset. I want to append these data so that I would have one master dataset that have all these datasets. I was thinking of using the following commands to work:


data masterdataset;
set data_202211;
do i = 202212  to 202307;
      some kinds of commands for append

I was not sure how I could make this code work so that I can have one master dataset. 

Could anyone give me some suggestion? Thank you in advance for your time and help!


Accepted Solutions
Super User

What about shortcut lists instead?


data master;
set data_202211-data_202212 data_2023: ;

View solution in original post

Super User

What about shortcut lists instead?


data master;
set data_202211-data_202212 data_2023: ;
Fluorite | Level 6

That is an amazing alternative and it worked perfectly!

Thank you very much for your help!

Jade | Level 19

@Reeza's reply is compact and effective:


What about shortcut lists instead?


data master;
set data_202211-data_202212 data_2023: ;

But if this compact SET statement implies reading of a large number of datasets, just be aware that computer memory is allocated to build a buffer for each dataset.


I have dealt with this situation when working with daily TAQ (trade and/or quote) datasets from the us stock market, with names constructed with the pattern CTyyyymmdd.  If one were reading all the data for, say, June and July (i.e. SET CT202306: CT202307:), it would require 42 such buffers - one for each trading day.   And you can be sure many researchers may not want to stop at analyzing two months of data.  


The upshot is that this can become costly to performance.


In certain cases, there is an escape:


If all those datasets had the same variables, then use of the statement


SET CT202306: CT202307:    open=defer;

helps a lot.  The "open=defer" option tells SAS to defer opening the second dataset until the first is fully read. and apply the same principle to each of the datasets.  That is, it tells SAS to re-use the same buffer for each of the 42 datasets.



Even if they have different variables, there is a trivial workaround: create an empty dummy dataset with all the variables (but no observations), and then list it first in the SET statement:

data dummy;
  set ct202306: ct202307: ;

data new;
  set dummy ct202306: ct202307: open=defer;


BUT (you knew another "but" was coming, right?) it still doesn't allow OPEN=DEFER in the presence of a BY statement.  Consider this code (assume each dataset has one DATE value, but is sorted by STOCK_TICKER),


set CT202306: CT202307: open=defer;
by stock_ticker date;

This can't reduce the number of buffers, because SAS has to have all 42 datasets open in order to produce the interleaved sequencing demanded by the BY statement.  In fact, it won't run at all, but will generate



ERROR: OPEN=DEFER is incompatible with the BY statement.
NOTE: The SAS System stopped processing this step because of errors.

in the log.




The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets




Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg



Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 3 in conversation