Hi,
I am working with multiple data sets. There are a total of 4 data sets per year (~250 variables total across all 4 datasets and I have to keep all of them),and I have 10 years of data.
I need help coming up with a macro to minimize the code I have to write.
Here is a list of the 4 data sets for year 2000 (I have 4 data sets each for years 2000-2015):
*List of data sets for year 2000;
clin_2000_phys_exam
clin_2000_lab
clin_2000_group
clin_2000_final
*List of data sets for year 2001;
clin_2001_phys_exam
clin_2001_lab
clin_2001_group
clin_2001_final
......and this repeats until we reach year 2010
What I am trying to do is something like this for each year:
*Merging "clin_2000_phys_exam" data with "clin_2000_lab" data by form;
proc sort data=clin_2000_phys_exam; by form; run;
proc sort data=clin_2000_lab; by form; run;
data phys_lab_2000;
merge clin_2000_phys_exam clin_2000_lab;
by form;
run;
*Merging the newly created "phys_lab_2000" data with "clin_2000_group" and "clin_2000_final" data by ID;
proc sort data=phys_lab_2000; by ID; run;
proc sort data=clin_2000_group; by ID; run;
proc sort data=clin_2000_final; by ID; run;
data 2000_final data;
merge phys_lab_2000 clin_2000_group clin_2000_final;
by ID;
run;
*Repeat the code above for all 10 years (years 2000-2010) until each year has a merged final dataset;
*Concatenate data together to create a long dataset that contains data for all 10 years;
data year200_year2011;
set 2000_final data
2001_final data
2002_final data
2003_final data
2004_final data
2005_final data
2006_final data
2007_final data
2008_final data
2009_final data
20010_final data;
run;
I need help coming up with a SAS macro that does this so that I don't have to rewrite the same line of code for each year since the only difference would be the name of the dataset.
I appreciate any help you can provide.
It looks like you know precisely what code needs to run for each year. Here's an example of turning it into a macro. I'll start with a piece of the code you supplied:
proc sort data=phys_lab_2000; by ID; run;
proc sort data=clin_2000_group; by ID; run;
proc sort data=clin_2000_final; by ID; run;
data 2000_final data;
merge phys_lab_2000 clin_2000_group clin_2000_final;
by ID;
run;
You might want to add to that code, since you have indicated that more processing does take place for each year. Turning it into a macro that processes one year would look like this. It's as simple as replacing "2000" with "&year.":
%macro one_year (year=);
proc sort data=phys_lab_&year.; by ID; run;
proc sort data=clin_&year._group; by ID; run;
proc sort data=clin_&year._final; by ID; run;
data &year._final data;
merge phys_lab_&year. clin_&year._group clin_&year._final;
by ID;
run;
%mend one_year;
Then actually executing the code looks like this:
%one_year (year=2000)
%one_year (year=2001)
etc.
Note that you will need to change the names of one of your data sets. Data set names cannot begin with a number, so this would be an invalid name:
2000_final data
It looks like you know precisely what code needs to run for each year. Here's an example of turning it into a macro. I'll start with a piece of the code you supplied:
proc sort data=phys_lab_2000; by ID; run;
proc sort data=clin_2000_group; by ID; run;
proc sort data=clin_2000_final; by ID; run;
data 2000_final data;
merge phys_lab_2000 clin_2000_group clin_2000_final;
by ID;
run;
You might want to add to that code, since you have indicated that more processing does take place for each year. Turning it into a macro that processes one year would look like this. It's as simple as replacing "2000" with "&year.":
%macro one_year (year=);
proc sort data=phys_lab_&year.; by ID; run;
proc sort data=clin_&year._group; by ID; run;
proc sort data=clin_&year._final; by ID; run;
data &year._final data;
merge phys_lab_&year. clin_&year._group clin_&year._final;
by ID;
run;
%mend one_year;
Then actually executing the code looks like this:
%one_year (year=2000)
%one_year (year=2001)
etc.
Note that you will need to change the names of one of your data sets. Data set names cannot begin with a number, so this would be an invalid name:
2000_final data
Though I don't have your data, hope this helps.
/*A sample dataset that mimics the shape of your dataset*/
%macro repeat1;
%do repeat=2000 %to 2010;
data clin_&repeat._phys_exam clin_&repeat._lab(drop=id) clin_&repeat._group(drop=form) clin_&repeat._final(drop=form);
do form=1,2;
do id=1,2;
output clin_&repeat._phys_exam;
if id=1 then output clin_&repeat._lab;
if form=1 then output clin_&repeat._group;
if form=1 then output clin_&repeat._final;
end;
end;
run;
%end;
%mend;
%repeat1;
/*The code that does what you need*/
%macro repeat2;
%do repeat=2000 %to 2010;
proc sort data=clin_&repeat._phys_exam;
by form;
run;
proc sort data=clin_&repeat._lab;
by form;
run;
data phys_lab_&repeat.;
merge clin_&repeat._phys_exam clin_&repeat._lab;
by form;
run;
proc sort data=phys_lab_&repeat.;
by id;
run;
proc sort data=clin_&repeat._group;
by id;
run;
proc sort data=clin_&repeat._final;
by id;
run;
data phys_lab_&repeat.;
merge phys_lab_&repeat. clin_&repeat._group clin_&repeat._final;
by id;
run;
proc append base=year2000_year2010 data=phys_lab_&repeat.;
run;
%end;
%mend;
%repeat2;
Though this may not be the best solution, this may do something you mentioned.
Here's a quick tutorial on turning working code into a macro, which is what you want to do here:
https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.