DATA Step, Macro, Functions and more

Append / combine data sets under a folder with source / contributing data sets identified

Accepted Solution Solved
Reply
Contributor zz
Contributor
Posts: 28
Accepted Solution

Append / combine data sets under a folder with source / contributing data sets identified

How do I go through a directory and combine datasets under it?  I also need to add a column / variable that identifies contributing dataset (dataset name).  

 

Thanks!


Accepted Solutions
Solution
‎05-31-2016 12:47 PM
Super User
Super User
Posts: 7,413

Re: Append / combine data sets under a folder with source / contributing data sets identified

Try the one step approach:

data _null_;
  set sashelp.vtable (where=(libname="DT")) end=last;
  if _n_=1 then call execute('data final; length dataset $100; set ');
  call execute(cat(' ',strip(name));
  if last the call execute(' indsname=ds; dataset=ds; run;');
run;

Basically the sashelp.vtable is pulled in, filetering on the where clause.  On first record the datastep part is written out, then for each obs returned that is written out and on the final record the datastep is closed, and then that whole code block is run.  You can aplpy any manipulations to it you want.

View solution in original post


All Replies
Super User
Super User
Posts: 7,413

Re: Append / combine data sets under a folder with source / contributing data sets identified

The indsname option on set statement can be used to get dataset name:

data want;
  set have1 have2 have3 indsname=dsname;
run;

You can use the set to append datasets.  Proc append can also be used.  You could also do union all in proc sql.  They all have their pluses and negatives so check out the manual.

Super User
Posts: 10,538

Re: Append / combine data sets under a folder with source / contributing data sets identified

One aproach

data want;
   Length dataset $ 41;
   set lib.file1 lib.file2 indsname=ds;
   dataset=ds;
run;

The varaible referenced by indsname option will be a temporary variable so you need to create a permanent one to keep the value as assigned above in the output.

 

You can use explicit dataset names as in the example or if you have similar names you could use a list:

 

set lib.sales: indsname=ds; would get all data sets in the library that start with SALES in the name.

Note that you my get warning about lengths of character variables different. The length of the first encountered set will be the length so truncation may be an issue. Also any variables with the same name must be of the same type. If VARX is numeric in one set and character in another this will fail (unless you rename the variable using dataset options).

Contributor zz
Contributor
Posts: 28

Re: Append / combine data sets under a folder with source / contributing data sets identified

Thanks for your input, ballardw!

 

 

I tested out the code with below:

 

LIBNAME dt '/data';
LIBNAME rs '/results';


proc sql noprint;
SELECT memname into : names
FROM dictionary.tables
WHERE libname='DT'
;


DATA RS.combined;
LENGTH dataset $ 41;
SET DT.&names indsname=ds;
dataset=ds;
RUN;

 

It created dataset that only included the first dataset 'ae' under 'DT' as source, with variable 'dataset having value 'DT.ae'.  There are about 30 some datasets under 'DT'.  How would I improve the above, so it includes all dataset in the 'combined' dataset?

 

Thanks in advance

Super User
Posts: 10,538

Re: Append / combine data sets under a folder with source / contributing data sets identified

[ Edited ]

First you need to look at the results of %put &names;

Second, you would have had only one reference to the libname.

 

Warning: writing into the same library can result in appending the same datasets multiple times if you keep running the code as the second time you will get ds.combined appended (and some corresponding issues about the value of the indsname variable)

 

Easier to create the libname.datasetname construct when you select it.

 

proc sql;
   select catt(libname,'.',memname) into : Names separated by ' '
   from dictionary.tables
   where libname='DT' and memtype='DATA';
quit;

%put &names;    /* so you see what the names are*/
data rs.combined;
   length dataset $ 41;
   set &names indsname;
   dataset=ds;
run;

And if your goal was to combine the sets in DS and DT then use this for the SQL

 

proc sql;
   select catt(libname,'.',memname) into : Names separated by ' '
   from dictionary.tables
   where libname in ('DT','DS') and memtype='DATA';
quit;

 

Solution
‎05-31-2016 12:47 PM
Super User
Super User
Posts: 7,413

Re: Append / combine data sets under a folder with source / contributing data sets identified

Try the one step approach:

data _null_;
  set sashelp.vtable (where=(libname="DT")) end=last;
  if _n_=1 then call execute('data final; length dataset $100; set ');
  call execute(cat(' ',strip(name));
  if last the call execute(' indsname=ds; dataset=ds; run;');
run;

Basically the sashelp.vtable is pulled in, filetering on the where clause.  On first record the datastep part is written out, then for each obs returned that is written out and on the final record the datastep is closed, and then that whole code block is run.  You can aplpy any manipulations to it you want.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 305 views
  • 0 likes
  • 3 in conversation