Hi guys,
suppose to have a number of folders:
Folder Tokyo: "C:\Users\....\Tokyo"
Folder NYC: "C:\Users\....\NYC"
Folder Singapore: "C:\Users\....\Singapore"
Each folder contains different files named identical across the folders: Hospitalization_flu, Hospitalization_Covid, ....
The files appear like this:
data Hospitalization_flu ;
input ID:$20. Admission :date09. Discharge :date09. Sex Morbidity;
format Admission date9. Discharge date9.;
cards;
0001 13JAN2017 25JAN2017 M Yes
0001 22FEB2018 03MAR2018 M No
0001 22FEB2018 03MAR2018 M Yes
0002 01DEC2016 14DEC2016 F No
0002 01DEC2016 14DEC2016 F No
0002 25DEC2017 02JAN2018 F No
0002 06JAN2018 09JAN2018 F Yes
;run;
I would like to append (vertically with set) all files with the same name from different folders, for example all "Hospitalization_flu" files to generate a unique Hospitalization_flu file but:
The folders are around 20 and so totally I have 20 files named for example "Hospitalization_flu", 20 named "Hospitalization_Covid" and so on. Moreover the files contain around 70 variables and to sort them manually is not feasible.
Note that IDs present in different files/folders are unique since from an anonymization process and files are SAS datasets.
Can anyone help me please to automate the process?
Thank you in advance
Is this a one time thing? or will the list of folders (or list of datasets) change over time and you need something that will detect the new folders and/or new datasets and generate the appropriate code?
You need to break the problem into pieces and figure out how you will solve each one.
If you know the names of the datasets and have librefs already defined pointing to the folders then the code you want to run will be something like this:
data Hospitalization_flu ;
length source $8 ;
set Tokyo.Hospitalization_flu
NYC.Hospitalization_flu
Singapor.Hospitalization_flu
indsname=dsname
;
source = scan(indsname,1,'.');
run;
Which will use the INDSNAME= option of the SET statement to get the name of the dataset that contributed the current observation into a temporary variable. The SCAN() function can then be used to pull out the libref part of the name. Note that the libref values are limited to 8 characters.
If you need help finding out what datasets exist so that the code can be automated then you probably want to use a tool that can find the names of all of the files in a directory tree. So something like: https://github.com/sasutils/macros/blob/master/dirtree.sas
Once you have the list you can use it to generate code by just using data steps (macros are not really needed if the goal is just to combine all of them).
So something like:
* Get list of files ;
%dirtree(\\Users\xxx\TopFolder,out=files,maxdepth=2)
* Subset to SAS datasets ;
* Generate libref for each path ;
data datasets;
set files;
where scan(file,-1,'.')='sas7bdat';
by path;
length libref $8 ;
retain libref rc;
if first.path then do;
llbref=' ';
rc=libname(libref,path);
end;
run;
Now that you re-order the data by member name instead and use it to generate the data step.
proc sort data=datasets;
by filename path;
run;
filename code temp;
data _null_;
file code lrecl=75 ;
set datasets;
by filename ;
length memname $32 dsname $41 ;
memname=scan(filename,1,'.');
dsname=catx('.',libef,memname);
if first.filename then put
'data ' memname ';'
/ ' set ' @
;
put dsname @ ;
if last.filename then put
'indsname=dsname;'
/ ' length libref $8;'
/ ' libref=scan(dsname,1,".");'
/ 'run;'
;
run;
Definie "automate".
If you need to repeat the jobs and perdform updates due to new folders and variables, sure, I would build macro programs to do the job.
But it's not a super quick task, but here are som general thoughts.
You need a libname for each folder, so have a list/dataset with paths and name for each libref, use a data step with call execute to assign them.
If they are already assigned, you could go the other way around and fetch this informaiton from DICTIONARY.LIBNAMES using proc sql;
In macro lop over the dat stes from each folder, potentially build a view on each table where you add the folder path in a new variable, and at the end concatenate all of them. If your datsets are not big you would be fine using a KEEP statement in the data step that concatenets the data. Otherwise generate a KEEP= data set option when you first read each data set.
@NewUsrStat wrote:
Hi, yes they are sas datasets
I hope someone made sure that all of the like named data sets have the same named variables and that the same named variables all have the same type, numeric or character, and that the Lengths of the variables are the same.
Otherwise combining data set with different types for the same variable will fail with errors and different lengths may result in truncation of data values.
Is this a one time thing? or will the list of folders (or list of datasets) change over time and you need something that will detect the new folders and/or new datasets and generate the appropriate code?
You need to break the problem into pieces and figure out how you will solve each one.
If you know the names of the datasets and have librefs already defined pointing to the folders then the code you want to run will be something like this:
data Hospitalization_flu ;
length source $8 ;
set Tokyo.Hospitalization_flu
NYC.Hospitalization_flu
Singapor.Hospitalization_flu
indsname=dsname
;
source = scan(indsname,1,'.');
run;
Which will use the INDSNAME= option of the SET statement to get the name of the dataset that contributed the current observation into a temporary variable. The SCAN() function can then be used to pull out the libref part of the name. Note that the libref values are limited to 8 characters.
If you need help finding out what datasets exist so that the code can be automated then you probably want to use a tool that can find the names of all of the files in a directory tree. So something like: https://github.com/sasutils/macros/blob/master/dirtree.sas
Once you have the list you can use it to generate code by just using data steps (macros are not really needed if the goal is just to combine all of them).
So something like:
* Get list of files ;
%dirtree(\\Users\xxx\TopFolder,out=files,maxdepth=2)
* Subset to SAS datasets ;
* Generate libref for each path ;
data datasets;
set files;
where scan(file,-1,'.')='sas7bdat';
by path;
length libref $8 ;
retain libref rc;
if first.path then do;
llbref=' ';
rc=libname(libref,path);
end;
run;
Now that you re-order the data by member name instead and use it to generate the data step.
proc sort data=datasets;
by filename path;
run;
filename code temp;
data _null_;
file code lrecl=75 ;
set datasets;
by filename ;
length memname $32 dsname $41 ;
memname=scan(filename,1,'.');
dsname=catx('.',libef,memname);
if first.filename then put
'data ' memname ';'
/ ' set ' @
;
put dsname @ ;
if last.filename then put
'indsname=dsname;'
/ ' length libref $8;'
/ ' libref=scan(dsname,1,".");'
/ 'run;'
;
run;
The attachment works when the type of variables across all datasets is same. If not, the code must be modified to handle this mismatch, and errors will occur while the code is running.
Please update the rows about the 'libname' as you need.
Edit by KB:
The code in a code window, so nobody needs to view/download attachments:
libname folder1 "path_to_folder1";
libname folder2 "path_to_folder2";
libname folder3 "path_to_folder3";
libname folder4 "path_to_folder4";
libname placed "path_to_you_wanna_place_dataset";
options validvarname=upcase;
/*get the informations about all datasets of all folder*/
proc sql noprint;
create table vcolumn as
select distinct libname,memname,name,type,length from sashelp.vcolumn
where libname like 'FOLDER%' and memtype='DATA' ;
quit;
/*get properties of all unique variables*/
proc sql noprint;
create table vcol as
select distinct memname,name,type,max(length) as length
from vcolumn group by memname,name;
quit;
proc sort;by memname name type;run;
data attributes_var;
length attr $10000 attr_var $200;
retain attr;
set vcol;
by memname name type;
if type in ('char') then attr_var=catx(' ',name,cats('char(',put(length,best.),')'));
else attr_var=catx(' ',name,'num');
if first.memname then attr=strip(attr_var);
else attr=catx(' , ',attr,attr_var);
if last.memname then attr=cats('(',attr,');');
if last.memname then output;
run;
/*get datasets to set*/
proc sql noprint;
create table vname as
select distinct memname,libname from vcolumn;
quit;
proc sort;by memname libname;run;
data attributes_dataset;
length attr $10000 attr_dataset $200;
retain attr;
set vname;
by memname libname;
attr_dataset=catx('.',libname,memname);
if first.memname then attr=strip(attr_dataset);
else attr=catx(' ',attr,attr_dataset);
if last.memname then output;
run;
proc sql noprint;
select count(distinct memname) into: nloop trimmed from attributes_dataset;
quit;
%put &nloop.;
%macro loop(placed_folder=);
%do i=1 %to &nloop.;
data _null_;
set attributes_dataset;
if _n_=&i.;
call symputx('name',strip(memname),'G');
call symputx('datasets',strip(attr),'G');
run;
proc sql noprint;
select distinct strip(attr) into: vars
from attributes_var where memname in ("&name.");
quit;
%put &i./&nloop.==>&name. &datasets.<==>&vars.;
proc sql noprint;
create table temp
&vars.;
quit;
data &placed_folder..&name.;
set temp &datasets.;
run;
%end;
%mend loop;
/*the arguement of 'placed_folder' is the libname to place the combined dataset*/
%loop(placed_folder=placed);
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.