Hi everyone,
I have 13 files per year, one file per year (from year 1998 to year 2010). Each file contains 22 datasets. (in each year-file, i have 22 datasets corresponding to one data set per region, I have 22 region).
In the 22 datasets of each year file i want to create a variable year equal to the year of each data set.
I put two pictures to give an example of what I want:
the first picture shows the year file and the second shows what contains the file for year 1998.
What I've done until now is create a macro for each different year, but I though maybe there is a way to put a macro in a macro in order to avoid copying pasting the code 13 times for each year.
I hope you can help me on that!
Thank you in advance,
Eugéni
Ok, a bit clearer now. Right so something like;
data _null_; do yr=1998 to 2010; call execute(cats('libname tmp "c:\pathtofiles\DADS Postes ',put(yr,4.),'";')); call execute(cats('data total',put(yr,4.),'; set post: indsname=tmp; year=substr(tmp,5,); run;'));
call execute('libname tmp clear;');
end;
run;
data final;
set total:;
run;
The first loop creates code which creates a libname to the year, combines all the datasets with prefix "post" - from your screenshot - into one bg dataset for the year called total<year>. Then clears the libname and loops round for the next year. Once all that is done then the final datastep puts all the total<year> datasets together in on final dataset. Note this assumes all the files have the same structure.
Pictures are not very helpful.
As for your question, without seeing some code its hard to say, but say I have:
%macro Add_Var (lib,ds=); data &lib..&ds.; set &lib..&ds.; new_var=1; run; %mend Add_Var; /* Now I use Base SAS to call that macro over and over */ data _null_; do yr=1998 to 2010; do reg="ABC","DEF"; call execute(cats('%Add_Var (lib=',put(yr,4.),',ds=',reg,');')); end; end; run;
Now the real question here however is why do you have all your data split up into little bits. It is never a good idea, and will make all your programming much harder. Put it all together in one dataset with a variable for year, and variable for region, and your coding will be simpler and faster as you will use by groups.
This clear as mud.
Are you trying to add a year variable to multiple data sets based on their folder or name of the file?
And you need to do it multiple times for multiple folders?
Do you have code that works for one?
Is there a naming convention that helps to define the folders/year?
Are you planning to combine these at some point?
Sorry if my message was not clear enough.
Are you trying to add a year variable to multiple data sets based on their folder or name of the file? Yes
And you need to do it multiple times for multiple folders? yes
This is the code that works for one year-file (2002 file):
Libname data02 'P:\2002';
%macro year_2002(dn);
data data02.&dn;
set data02.&dn;
year=2002;
run;
% mend year_2002;
proc contents data=data02._all_ noprint out=contents;
run;
data _null_;
set contents;
by memname;
call execute (cats('%year_2002(',memname,')'));
It adds the variable year=2002 to every dataset contained in file 2002.
Is there a naming convention that helps to define the folders/year?
Yes my folders are called: DADS Postes 1998; DADS Postes 1999; DADS Postes 2000 etc until 2010; so there is a repeting convention on the name of the files.
Are you planning to combine these at some point?
I will append files contained in each year file (all regions for 2002, all regions for 2003 etc) but even if I do this before, i do not know how to create the variable year for all files if year is not equal in all files.
Thank you very much, I hope I made things clearer !
Eugenie
You might be able to leave your files along and use the INDSNAME= option on the INFILE statement to get at the dataset/folder name.
For example you could create separated libnames for each folder.
libname DADS1998 'DADS Postes 1998';
libname DADS1999 'DADS Postes 1999';
libname DADS2000 'DADS Postes 2000';
Then if each libref has a member named MYDATA you could make a temporary dataset that combines three years into one dataset.
data mydata ;
length dsn $41 year 8;
set dads1998.mydata dads1999.mydata dads2000.mydata indsname=dsn;
year = input(substr(dsn,5),4.);
run;
Or you could skip the librefs and just reference the datasets by filename. Make the variable that you use with INDSNAME longer.
data mydata ;
length dsn $256 year 8;
set
'DADS Postes 1998/mydata'
'DADS Postes 1999/mydata'
'DADS Postes 2000/mydata'
indsname=dsn
;
year = input(scan(scan(dsn,-2,'/\'),-1,' '),4.);
run;
Ok, a bit clearer now. Right so something like;
data _null_; do yr=1998 to 2010; call execute(cats('libname tmp "c:\pathtofiles\DADS Postes ',put(yr,4.),'";')); call execute(cats('data total',put(yr,4.),'; set post: indsname=tmp; year=substr(tmp,5,); run;'));
call execute('libname tmp clear;');
end;
run;
data final;
set total:;
run;
The first loop creates code which creates a libname to the year, combines all the datasets with prefix "post" - from your screenshot - into one bg dataset for the year called total<year>. Then clears the libname and loops round for the next year. Once all that is done then the final datastep puts all the total<year> datasets together in on final dataset. Note this assumes all the files have the same structure.
Thank you for your answer!
I understand the logic, nevertheless it does not work.
The error message appears at this point:
call execute(cats('data total',put(yr,4.),'; set post: indsname=tmp; year=substr(tmp,5,); run;'));
It says: " the data set list (work.post:) does not contain any member". That means maybe that the library tmp was not correctly created so it can not find the datasets beginning by "post"?
It creates 14 datafiles total1998, total1999 etc but they are all empty, with 0 observations
@eugenia67 wrote:
Thank you for your answer!
I understand the logic, nevertheless it does not work.
The error message appears at this point:
call execute(cats('data total',put(yr,4.),'; set post: indsname=tmp; year=substr(tmp,5,); run;'));It says: " the data set list (work.post:) does not contain any member". That means maybe that the library tmp was not correctly created so it can not find the datasets beginning by "post"?
It creates 14 datafiles total1998, total1999 etc but they are all empty, with 0 observations
You need to adapt the code. In this case it was designed to work off the work library. Change it so it uses your libname reference from the previous line. See the modified code below and the addition of tmp. in front of the data set name post.
data _null_; do yr=1998 to 2010; call execute(cats('libname tmp "c:\pathtofiles\DADS Postes ',put(yr,4.),'";')); call execute(cats('data total',put(yr,4.),'; set tmp.post: indsname=tmp; year=substr(tmp,5,); run;'));
call execute('libname tmp clear;');
end;
run;
data final;
set total:;
run;
I did put tmp.postes
Actually it was not working because
"c:\pathtofiles\DADS Postes ',put(yr,4.),'";'));
the program was not recognizing DADS postes 1998 because of the space between postes and 1998. I removed the space in the datafile and it worked!
Thank you very much
@eugenia67 wrote:
I did put tmp.postes
Not in the code you posted and that's all we can see.
Assumptions:
You have 13 folders.
Each folder has 22 datasets for a specific region
You want to combine all the datasets in the folder into one data set per year, with a variable for year and region name.
Here's some untested code as a sample. It apends the
%macro append_data(path, year);
libname myfiles "&path";
data year&year.;
length source region $50.;
set post: indsname = source;
region = source;
year= &year;
run;
libname myfiles;
%mend;
And then call it 13 times:
%append_data(P:\2002\, 2002);
%append_data(P:\2003\, 2003);
....
...
You can make it more dynamic if you want using a data step and CALL EXECUTE to call the macro.
I'll leave that as an exercise for you, with the note that CALL EXECUTE documentation has a good example of implementing this.
Hope this helps 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.