Hi,
I am working with multiple data sets. There are a total of 4 data sets per year (~250 variables total across all 4 datasets and I have to keep all of them),and I have 10 years of data.
I need help coming up with a macro to minimize the code I have to write.
Here is a list of the 4 data sets for year 2000 (I have 4 data sets each for years 2000-2015):
*List of data sets for year 2000;
clin_2000_phys_exam
clin_2000_lab
clin_2000_group
clin_2000_final
*List of data sets for year 2001;
clin_2001_phys_exam
clin_2001_lab
clin_2001_group
clin_2001_final
......and this repeats until we reach year 2010
What I am trying to do is something like this for each year:
*Merging "clin_2000_phys_exam" data with "clin_2000_lab" data by form;
proc sort data=clin_2000_phys_exam; by form; run;
proc sort data=clin_2000_lab; by form; run;
data phys_lab_2000;
merge clin_2000_phys_exam clin_2000_lab;
by form;
run;
*Merging the newly created "phys_lab_2000" data with "clin_2000_group" and "clin_2000_final" data by ID;
proc sort data=phys_lab_2000; by ID; run;
proc sort data=clin_2000_group; by ID; run;
proc sort data=clin_2000_final; by ID; run;
data 2000_final data;
merge phys_lab_2000 clin_2000_group clin_2000_final;
by ID;
run;
*Repeat the code above for all 10 years (years 2000-2010) until each year has a merged final dataset;
*Concatenate data together to create a long dataset that contains data for all 10 years;
data year200_year2011;
set 2000_final data
2001_final data
2002_final data
2003_final data
2004_final data
2005_final data
2006_final data
2007_final data
2008_final data
2009_final data
20010_final data;
run;
I need help coming up with a SAS macro that does this so that I don't have to rewrite the same line of code for each year since the only difference would be the name of the dataset.
I appreciate any help you can provide.
It looks like you know precisely what code needs to run for each year. Here's an example of turning it into a macro. I'll start with a piece of the code you supplied:
proc sort data=phys_lab_2000; by ID; run;
proc sort data=clin_2000_group; by ID; run;
proc sort data=clin_2000_final; by ID; run;
data 2000_final data;
merge phys_lab_2000 clin_2000_group clin_2000_final;
by ID;
run;
You might want to add to that code, since you have indicated that more processing does take place for each year. Turning it into a macro that processes one year would look like this. It's as simple as replacing "2000" with "&year.":
%macro one_year (year=);
proc sort data=phys_lab_&year.; by ID; run;
proc sort data=clin_&year._group; by ID; run;
proc sort data=clin_&year._final; by ID; run;
data &year._final data;
merge phys_lab_&year. clin_&year._group clin_&year._final;
by ID;
run;
%mend one_year;
Then actually executing the code looks like this:
%one_year (year=2000)
%one_year (year=2001)
etc.
Note that you will need to change the names of one of your data sets. Data set names cannot begin with a number, so this would be an invalid name:
2000_final data
It looks like you know precisely what code needs to run for each year. Here's an example of turning it into a macro. I'll start with a piece of the code you supplied:
proc sort data=phys_lab_2000; by ID; run;
proc sort data=clin_2000_group; by ID; run;
proc sort data=clin_2000_final; by ID; run;
data 2000_final data;
merge phys_lab_2000 clin_2000_group clin_2000_final;
by ID;
run;
You might want to add to that code, since you have indicated that more processing does take place for each year. Turning it into a macro that processes one year would look like this. It's as simple as replacing "2000" with "&year.":
%macro one_year (year=);
proc sort data=phys_lab_&year.; by ID; run;
proc sort data=clin_&year._group; by ID; run;
proc sort data=clin_&year._final; by ID; run;
data &year._final data;
merge phys_lab_&year. clin_&year._group clin_&year._final;
by ID;
run;
%mend one_year;
Then actually executing the code looks like this:
%one_year (year=2000)
%one_year (year=2001)
etc.
Note that you will need to change the names of one of your data sets. Data set names cannot begin with a number, so this would be an invalid name:
2000_final data
Though I don't have your data, hope this helps.
/*A sample dataset that mimics the shape of your dataset*/
%macro repeat1;
%do repeat=2000 %to 2010;
data clin_&repeat._phys_exam clin_&repeat._lab(drop=id) clin_&repeat._group(drop=form) clin_&repeat._final(drop=form);
do form=1,2;
do id=1,2;
output clin_&repeat._phys_exam;
if id=1 then output clin_&repeat._lab;
if form=1 then output clin_&repeat._group;
if form=1 then output clin_&repeat._final;
end;
end;
run;
%end;
%mend;
%repeat1;
/*The code that does what you need*/
%macro repeat2;
%do repeat=2000 %to 2010;
proc sort data=clin_&repeat._phys_exam;
by form;
run;
proc sort data=clin_&repeat._lab;
by form;
run;
data phys_lab_&repeat.;
merge clin_&repeat._phys_exam clin_&repeat._lab;
by form;
run;
proc sort data=phys_lab_&repeat.;
by id;
run;
proc sort data=clin_&repeat._group;
by id;
run;
proc sort data=clin_&repeat._final;
by id;
run;
data phys_lab_&repeat.;
merge phys_lab_&repeat. clin_&repeat._group clin_&repeat._final;
by id;
run;
proc append base=year2000_year2010 data=phys_lab_&repeat.;
run;
%end;
%mend;
%repeat2;
Though this may not be the best solution, this may do something you mentioned.
Here's a quick tutorial on turning working code into a macro, which is what you want to do here:
https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.