DATA Step, Macro, Functions and more

Merge multiple datasets with one dataset using macro

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Merge multiple datasets with one dataset using macro

Hi there, 

I am using SAS 9.4.

I have multiple datasets titled as 2007q1, 2007q2, 2007q3, ..., 2017q3, 2017q4. I also have a dataset titled Cohort. All dataset contain the variable patid. 

I want to merge all those &yr.q&qr. datasets with the dataset Cohort by the common variable patid. There should be only 1 final merged dataset. The merged file should contain all and only patids that are in the dataset Cohort. 

Here is my code:

%macro mergedata;
	proc sort data=data.cohort; by patid; run;
	
	%do yr=2007 %to 2017;
		%do qr=1 %to 4;
			proc sort data=in1perc.ses_diag&yr.q&qr.; 
				by patid; 
			run;
			data data.merged;
				merge
				data.cohort (in=ina)
				in1perc.&yr.q&qr. (in=inb);
				by patid;
				if ina and inb;
			run;
		%end;
	%end;
%mend;

%mergedata

I know there is something wrong with the ina and inb part, but I cannot figure out how to fix it. 

The log file shows that the next dataset in the loop keeps replacing the previous merged dataset, instead of adding into it.

Here is the log for iterations 2007Q4 and 2008Q1. Note how the total observations has decreased in data.Merged.

 

NOTE: There were 1687 observations read from the data set DATA.COHORT.
NOTE: There were 712195 observations read from the data set IN1PERC.2007Q4.
NOTE: The data set DATA.MERGED has 7185 observations and 17 variables.
NOTE: There were 1687 observations read from the data set DATA.COHORT.
NOTE: There were 731171 observations read from the data set IN1PERC.2008Q1.
NOTE: The data set DATA.MERGED has 7031 observations and 17 variables.

 How do I fix this? Thank you for your help!


Accepted Solutions
Solution
a month ago
Super User
Posts: 6,785

Re: Merge multiple datasets with one dataset using macro

[ Edited ]
Posted in reply to zzfsimona

If I'm interpreting the question correctly, you would need to make this sort of change.  First, delete your data.merged data set.  Then change the last part of your loop where you now have:

 

 

 data data.merged;
merge data.cohort (in=ina) in1perc.&yr.q&qr. (in=inb);
by patid; if ina and inb;
run;

 

Instead, replace that section with:

 


data merged;
merge data.cohort (in=ina)
      in1perc.&yr.q&qr. (in=inb);
      by patid;
      if ina and inb;
run;
proc append data=merged base=data.merged;
run;

You can run this, even when DATA.MERGED does not yet exist.  For that case, MERGED will get copied to DATA.MERGED.

View solution in original post


All Replies
Super User
Posts: 10,280

Re: Merge multiple datasets with one dataset using macro

Posted in reply to zzfsimona

Instead of creating data.merged in the merge step, create work.merged, and then append it to data.merged within the macro loop.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Super User
Posts: 9,599

Re: Merge multiple datasets with one dataset using macro

Posted in reply to zzfsimona

I haven't tested this as in a meeting, however maybe something like - needs all datasets in one library:

data merged;
  merge in1perc.ses_diag:;
  by patid;
  if ina and inb;
run;

Note the : after the prefix part, normally it takes anything with the prefix, but not sure about merge.

 

Question is why do you have lots of same data in dated files in the first place?  Every time when I see this I immediately think that the process before could be improved to give one dataset with date in the data - makes everyones life so much easier.

Solution
a month ago
Super User
Posts: 6,785

Re: Merge multiple datasets with one dataset using macro

[ Edited ]
Posted in reply to zzfsimona

If I'm interpreting the question correctly, you would need to make this sort of change.  First, delete your data.merged data set.  Then change the last part of your loop where you now have:

 

 

 data data.merged;
merge data.cohort (in=ina) in1perc.&yr.q&qr. (in=inb);
by patid; if ina and inb;
run;

 

Instead, replace that section with:

 


data merged;
merge data.cohort (in=ina)
      in1perc.&yr.q&qr. (in=inb);
      by patid;
      if ina and inb;
run;
proc append data=merged base=data.merged;
run;

You can run this, even when DATA.MERGED does not yet exist.  For that case, MERGED will get copied to DATA.MERGED.

Super User
Posts: 13,583

Re: Merge multiple datasets with one dataset using macro

Posted in reply to zzfsimona

@zzfsimona wrote:

Hi there, 

I am using SAS 9.4.

I have multiple datasets titled as 2007q1, 2007q2, 2007q3, ..., 2017q3, 2017q4. I also have a dataset titled Cohort. All dataset contain the variable patid. 

I want to merge all those &yr.q&qr. datasets with the dataset Cohort by the common variable patid. There should be only 1 final merged dataset. The merged file should contain all and only patids that are in the dataset Cohort. 

Here is my code:

%macro mergedata;
	proc sort data=data.cohort; by patid; run;
	
	%do yr=2007 %to 2017;
		%do qr=1 %to 4;
			proc sort data=in1perc.ses_diag&yr.q&qr.; 
				by patid; 
			run;
			data data.merged;
				merge
				data.cohort (in=ina)
				in1perc.&yr.q&qr. (in=inb);
				by patid;
				if ina and inb;
			run;
		%end;
	%end;
%mend;

%mergedata

I know there is something wrong with the ina and inb part, but I cannot figure out how to fix it. 

The log file shows that the next dataset in the loop keeps replacing the previous merged dataset, instead of adding into it.

Here is the log for iterations 2007Q4 and 2008Q1. Note how the total observations has decreased in data.Merged.

 


These two lines don't match:

proc sort data=in1perc.ses_diag&yr.q&qr.; 
 
in1perc.&yr.q&qr. (in=inb)

did you mean to use the sored data? as the data set names are not the same.

 

 

I suggest that you do with two data sets manually and see if the results are as desired. I suspect not as MERGE is going to replace like-named variable values.

I suspect that what you actually want to do is append all of the 2007q1, 2007q2 etc sets together and then select.

Some thing like

data work.temp;
   set  in1perc.ses_diag: ;
run;

proc sql;
   create table data.merged as
   select b.*
   from data.cohort as a
        left join
        work.temp
        on a.patid=b.patid
   ;
quit;

No claims for efficiency except in short code if this actually accomplishes what you want.

 

 

The set in1perc.ses_diag: ; will append all data sets in the in1perc library that start with ses_diag.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 80 views
  • 0 likes
  • 5 in conversation