How do I go through a directory and combine datasets under it? I also need to add a column / variable that identifies contributing dataset (dataset name).
Thanks!
Try the one step approach:
data _null_; set sashelp.vtable (where=(libname="DT")) end=last; if _n_=1 then call execute('data final; length dataset $100; set '); call execute(cat(' ',strip(name)); if last the call execute(' indsname=ds; dataset=ds; run;'); run;
Basically the sashelp.vtable is pulled in, filetering on the where clause. On first record the datastep part is written out, then for each obs returned that is written out and on the final record the datastep is closed, and then that whole code block is run. You can aplpy any manipulations to it you want.
The indsname option on set statement can be used to get dataset name:
data want; set have1 have2 have3 indsname=dsname; run;
You can use the set to append datasets. Proc append can also be used. You could also do union all in proc sql. They all have their pluses and negatives so check out the manual.
One aproach
data want;
Length dataset $ 41;
set lib.file1 lib.file2 indsname=ds;
dataset=ds;
run;
The varaible referenced by indsname option will be a temporary variable so you need to create a permanent one to keep the value as assigned above in the output.
You can use explicit dataset names as in the example or if you have similar names you could use a list:
set lib.sales: indsname=ds; would get all data sets in the library that start with SALES in the name.
Note that you my get warning about lengths of character variables different. The length of the first encountered set will be the length so truncation may be an issue. Also any variables with the same name must be of the same type. If VARX is numeric in one set and character in another this will fail (unless you rename the variable using dataset options).
Thanks for your input, ballardw!
I tested out the code with below:
LIBNAME dt '/data';
LIBNAME rs '/results';
proc sql noprint;
SELECT memname into : names
FROM dictionary.tables
WHERE libname='DT'
;
DATA RS.combined;
LENGTH dataset $ 41;
SET DT.&names indsname=ds;
dataset=ds;
RUN;
It created dataset that only included the first dataset 'ae' under 'DT' as source, with variable 'dataset having value 'DT.ae'. There are about 30 some datasets under 'DT'. How would I improve the above, so it includes all dataset in the 'combined' dataset?
Thanks in advance
First you need to look at the results of %put &names;
Second, you would have had only one reference to the libname.
Warning: writing into the same library can result in appending the same datasets multiple times if you keep running the code as the second time you will get ds.combined appended (and some corresponding issues about the value of the indsname variable)
Easier to create the libname.datasetname construct when you select it.
proc sql;
select catt(libname,'.',memname) into : Names separated by ' '
from dictionary.tables
where libname='DT' and memtype='DATA';
quit;
%put &names; /* so you see what the names are*/
data rs.combined;
length dataset $ 41;
set &names indsname;
dataset=ds;
run;
And if your goal was to combine the sets in DS and DT then use this for the SQL
proc sql;
select catt(libname,'.',memname) into : Names separated by ' '
from dictionary.tables
where libname in ('DT','DS') and memtype='DATA';
quit;
Try the one step approach:
data _null_; set sashelp.vtable (where=(libname="DT")) end=last; if _n_=1 then call execute('data final; length dataset $100; set '); call execute(cat(' ',strip(name)); if last the call execute(' indsname=ds; dataset=ds; run;'); run;
Basically the sashelp.vtable is pulled in, filetering on the where clause. On first record the datastep part is written out, then for each obs returned that is written out and on the final record the datastep is closed, and then that whole code block is run. You can aplpy any manipulations to it you want.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.