BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eugenia67
Fluorite | Level 6

Hi everyone, 

 

I have 13 files per year, one file per year (from year 1998 to year 2010). Each file contains 22 datasets. (in each year-file, i have 22 datasets corresponding to one data set per region, I have 22 region). 

 

In the 22 datasets of each year file i want to create a variable year equal to the year of each data set. 

 

I put two pictures to give an example of what I want: 

the first picture shows the year file and the second shows what contains the file for year 1998.

 

What I've done until now is create a macro for each different year, but I though maybe there is a way to put a macro in a macro in order to avoid copying pasting the code 13 times for each year.

 

I hope you can help me on that! 

 

Thank you in advance, 

EugéniYear filesYear filesDatasets in a year fileDatasets in a year file

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ok, a bit clearer now.  Right so something like;

data _null_;
  do yr=1998 to 2010;
    call execute(cats('libname tmp "c:\pathtofiles\DADS Postes ',put(yr,4.),'";'));
    call execute(cats('data total',put(yr,4.),'; set post: indsname=tmp; year=substr(tmp,5,); run;'));
call execute('libname tmp clear;');
end;
run;

data final;
set total:;
run;

The first loop creates code which creates a libname to the year, combines all the datasets with prefix "post" - from your screenshot - into one bg dataset for the year called total<year>.  Then clears the libname and loops round for the next year.  Once all that is done then the final datastep puts all the total<year> datasets together in on final dataset.  Note this assumes all the files have the same structure.

View solution in original post

10 REPLIES 10
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Pictures are not very helpful.

As for your question, without seeing some code its hard to say, but say I have:

%macro Add_Var (lib,ds=);
  data &lib..&ds.;
    set &lib..&ds.;
    new_var=1;
  run;
%mend Add_Var;
 
/* Now I use Base SAS to call that macro over and over */
data _null_;
  do yr=1998 to 2010;
    do reg="ABC","DEF";
      call execute(cats('%Add_Var (lib=',put(yr,4.),',ds=',reg,');'));
    end;
  end;
run;

Now the real question here however is why do you have all your data split up into little bits.  It is never a good idea, and will make all your programming much harder.  Put it all together in one dataset with a variable for year, and variable for region, and your coding will be simpler and faster as you will use by groups.

Reeza
Super User

This clear as mud. 

Are you trying to add a year variable to multiple data sets based on their folder or name of the file?

 

And you need to do it multiple times for multiple folders?

 

Do you have code that works for one?

 

Is there a naming convention that helps to define the folders/year?

 

Are you planning to combine these at some point?

eugenia67
Fluorite | Level 6

Sorry if my message was not clear enough.

 

Are you trying to add a year variable to multiple data sets based on their folder or name of the file? Yes

And you need to do it multiple times for multiple folders? yes

 

This is the code that works for one year-file (2002 file): 

 

Libname data02 'P:\2002'; 

 

%macro year_2002(dn); 

data data02.&dn; 

set data02.&dn; 

year=2002; 

run; 

% mend year_2002; 

 

proc contents data=data02._all_ noprint out=contents; 

run; 

 

data _null_; 

set contents; 

by memname; 

call execute (cats('%year_2002(',memname,')')); 

 

It adds the variable year=2002 to every dataset contained in file 2002.

 

Is there a naming convention that helps to define the folders/year? 

Yes my folders are called: DADS Postes 1998; DADS Postes 1999; DADS Postes 2000 etc until 2010; so there is a repeting convention on the name of the files.

 

Are you planning to combine these at some point? 

I will append files contained in each year file (all regions for 2002, all regions for 2003 etc) but even if I do this before, i do not know how to create the variable year for all files if year is not equal in all files.

 

Thank you very much, I hope I made things clearer ! 

 

Eugenie

Tom
Super User Tom
Super User

You might be able to leave your files along and use the INDSNAME= option on the INFILE statement to get at the dataset/folder name.

For example you could create separated libnames for each folder.

libname DADS1998 'DADS Postes 1998';
libname DADS1999 'DADS Postes 1999';
libname DADS2000 'DADS Postes 2000';

Then if each libref has a member named MYDATA you could make a temporary dataset that combines three years into one dataset.

data mydata ;
  length dsn $41 year 8;
  set dads1998.mydata dads1999.mydata dads2000.mydata indsname=dsn;
  year = input(substr(dsn,5),4.);
run;

Or you could skip the librefs and just reference the datasets by filename.  Make the variable that you use with INDSNAME longer.

data mydata ;
  length dsn $256 year 8;
  set 
    'DADS Postes 1998/mydata'
    'DADS Postes 1999/mydata'
    'DADS Postes 2000/mydata'
    indsname=dsn
  ;
  year = input(scan(scan(dsn,-2,'/\'),-1,' '),4.);
run;
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ok, a bit clearer now.  Right so something like;

data _null_;
  do yr=1998 to 2010;
    call execute(cats('libname tmp "c:\pathtofiles\DADS Postes ',put(yr,4.),'";'));
    call execute(cats('data total',put(yr,4.),'; set post: indsname=tmp; year=substr(tmp,5,); run;'));
call execute('libname tmp clear;');
end;
run;

data final;
set total:;
run;

The first loop creates code which creates a libname to the year, combines all the datasets with prefix "post" - from your screenshot - into one bg dataset for the year called total<year>.  Then clears the libname and loops round for the next year.  Once all that is done then the final datastep puts all the total<year> datasets together in on final dataset.  Note this assumes all the files have the same structure.

eugenia67
Fluorite | Level 6

Thank you for your answer!

 

I understand the logic, nevertheless it does not work.

 

The error message appears at this point: 

 call execute(cats('data total',put(yr,4.),'; set post: indsname=tmp; year=substr(tmp,5,); run;'));

It says: " the data set list (work.post:) does not contain any member". That means maybe that the library tmp was not correctly created so it can not find the datasets beginning by "post"?

 

It creates 14 datafiles total1998, total1999 etc but they are all empty, with 0 observations

Reeza
Super User

@eugenia67 wrote:

Thank you for your answer!

 

I understand the logic, nevertheless it does not work.

 

The error message appears at this point: 

 call execute(cats('data total',put(yr,4.),'; set post: indsname=tmp; year=substr(tmp,5,); run;'));

It says: " the data set list (work.post:) does not contain any member". That means maybe that the library tmp was not correctly created so it can not find the datasets beginning by "post"?

 

It creates 14 datafiles total1998, total1999 etc but they are all empty, with 0 observations


You need to adapt the code. In this case it was designed to work off the work library. Change it so it uses your libname reference from the previous line. See the modified code below and the addition of tmp. in front of the data set name post.

 

data _null_;
  do yr=1998 to 2010;
    call execute(cats('libname tmp "c:\pathtofiles\DADS Postes ',put(yr,4.),'";'));
    call execute(cats('data total',put(yr,4.),'; set tmp.post: indsname=tmp; year=substr(tmp,5,); run;'));
call execute('libname tmp clear;');
end;
run;

data final;
set total:;
run;

  

eugenia67
Fluorite | Level 6

I did put tmp.postes

 

Actually it was not working because 

"c:\pathtofiles\DADS Postes ',put(yr,4.),'";'));

the program was not recognizing DADS postes 1998 because of the space between postes and 1998. I removed the space in the datafile and it worked!

 

Thank you very much 

Reeza
Super User

@eugenia67 wrote:

I did put tmp.postes

 

 


Not in the code you posted and that's all we can see.

Reeza
Super User

Assumptions:

You have 13 folders.

Each folder has 22 datasets for a specific region

You want to combine all the datasets in the folder into one data set per year, with a variable for year and region name.

 

Here's some untested code as a sample. It apends the 

 

 

%macro append_data(path, year);

libname myfiles "&path";

data year&year.;
length source region $50.;
set post: indsname = source;
region = source;
year= &year;
run;

libname myfiles;

%mend;

 

And then call it 13 times:

 

%append_data(P:\2002\, 2002);
%append_data(P:\2003\, 2003);
....


...

You can make it more dynamic if you want using a data step and CALL EXECUTE to call the macro. 

I'll leave that as an exercise for you, with the  note that CALL EXECUTE documentation has a good example of implementing this.

 

Hope this helps 🙂

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2492 views
  • 3 likes
  • 4 in conversation