Hi there,
I am using SAS 9.4.
I have multiple datasets titled as 2007q1, 2007q2, 2007q3, ..., 2017q3, 2017q4. I also have a dataset titled Cohort. All dataset contain the variable patid.
I want to merge all those &yr.q&qr. datasets with the dataset Cohort by the common variable patid. There should be only 1 final merged dataset. The merged file should contain all and only patids that are in the dataset Cohort.
Here is my code:
%macro mergedata;
proc sort data=data.cohort; by patid; run;
%do yr=2007 %to 2017;
%do qr=1 %to 4;
proc sort data=in1perc.ses_diag&yr.q&qr.;
by patid;
run;
data data.merged;
merge
data.cohort (in=ina)
in1perc.&yr.q&qr. (in=inb);
by patid;
if ina and inb;
run;
%end;
%end;
%mend;
%mergedata
I know there is something wrong with the ina and inb part, but I cannot figure out how to fix it.
The log file shows that the next dataset in the loop keeps replacing the previous merged dataset, instead of adding into it.
Here is the log for iterations 2007Q4 and 2008Q1. Note how the total observations has decreased in data.Merged.
NOTE: There were 1687 observations read from the data set DATA.COHORT.
NOTE: There were 712195 observations read from the data set IN1PERC.2007Q4.
NOTE: The data set DATA.MERGED has 7185 observations and 17 variables.
NOTE: There were 1687 observations read from the data set DATA.COHORT.
NOTE: There were 731171 observations read from the data set IN1PERC.2008Q1.
NOTE: The data set DATA.MERGED has 7031 observations and 17 variables.
How do I fix this? Thank you for your help!
If I'm interpreting the question correctly, you would need to make this sort of change. First, delete your data.merged data set. Then change the last part of your loop where you now have:
data data.merged;
merge data.cohort (in=ina) in1perc.&yr.q&qr. (in=inb);
by patid; if ina and inb;
run;
Instead, replace that section with:
data merged;
merge data.cohort (in=ina)
in1perc.&yr.q&qr. (in=inb);
by patid;
if ina and inb;
run;
proc append data=merged base=data.merged;
run;
You can run this, even when DATA.MERGED does not yet exist. For that case, MERGED will get copied to DATA.MERGED.
Instead of creating data.merged in the merge step, create work.merged, and then append it to data.merged within the macro loop.
I haven't tested this as in a meeting, however maybe something like - needs all datasets in one library:
data merged; merge in1perc.ses_diag:; by patid; if ina and inb; run;
Note the : after the prefix part, normally it takes anything with the prefix, but not sure about merge.
Question is why do you have lots of same data in dated files in the first place? Every time when I see this I immediately think that the process before could be improved to give one dataset with date in the data - makes everyones life so much easier.
If I'm interpreting the question correctly, you would need to make this sort of change. First, delete your data.merged data set. Then change the last part of your loop where you now have:
data data.merged;
merge data.cohort (in=ina) in1perc.&yr.q&qr. (in=inb);
by patid; if ina and inb;
run;
Instead, replace that section with:
data merged;
merge data.cohort (in=ina)
in1perc.&yr.q&qr. (in=inb);
by patid;
if ina and inb;
run;
proc append data=merged base=data.merged;
run;
You can run this, even when DATA.MERGED does not yet exist. For that case, MERGED will get copied to DATA.MERGED.
@zzfsimona wrote:
Hi there,
I am using SAS 9.4.
I have multiple datasets titled as 2007q1, 2007q2, 2007q3, ..., 2017q3, 2017q4. I also have a dataset titled Cohort. All dataset contain the variable patid.
I want to merge all those &yr.q&qr. datasets with the dataset Cohort by the common variable patid. There should be only 1 final merged dataset. The merged file should contain all and only patids that are in the dataset Cohort.
Here is my code:
%macro mergedata; proc sort data=data.cohort; by patid; run; %do yr=2007 %to 2017; %do qr=1 %to 4; proc sort data=in1perc.ses_diag&yr.q&qr.; by patid; run; data data.merged; merge data.cohort (in=ina) in1perc.&yr.q&qr. (in=inb); by patid; if ina and inb; run; %end; %end; %mend; %mergedata
I know there is something wrong with the ina and inb part, but I cannot figure out how to fix it.
The log file shows that the next dataset in the loop keeps replacing the previous merged dataset, instead of adding into it.
Here is the log for iterations 2007Q4 and 2008Q1. Note how the total observations has decreased in data.Merged.
These two lines don't match:
proc sort data=in1perc.ses_diag&yr.q&qr.;
in1perc.&yr.q&qr. (in=inb)
did you mean to use the sored data? as the data set names are not the same.
I suggest that you do with two data sets manually and see if the results are as desired. I suspect not as MERGE is going to replace like-named variable values.
I suspect that what you actually want to do is append all of the 2007q1, 2007q2 etc sets together and then select.
Some thing like
data work.temp; set in1perc.ses_diag: ; run; proc sql; create table data.merged as select b.* from data.cohort as a left join work.temp on a.patid=b.patid ; quit;
No claims for efficiency except in short code if this actually accomplishes what you want.
The set in1perc.ses_diag: ; will append all data sets in the in1perc library that start with ses_diag.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.