Solved: Merge multiple datasets with one dataset using macro

zzfsimona · Posted 06-21-2018 02:45 AM

Hi there,

I am using SAS 9.4.

I have multiple datasets titled as 2007q1, 2007q2, 2007q3, ..., 2017q3, 2017q4. I also have a dataset titled Cohort. All dataset contain the variable patid.

I want to merge all those &yr.q&qr. datasets with the dataset Cohort by the common variable patid. There should be only 1 final merged dataset. The merged file should contain all and only patids that are in the dataset Cohort.

Here is my code:

%macro mergedata;
	proc sort data=data.cohort; by patid; run;
	
	%do yr=2007 %to 2017;
		%do qr=1 %to 4;
			proc sort data=in1perc.ses_diag&yr.q&qr.; 
				by patid; 
			run;
			data data.merged;
				merge
				data.cohort (in=ina)
				in1perc.&yr.q&qr. (in=inb);
				by patid;
				if ina and inb;
			run;
		%end;
	%end;
%mend;

%mergedata

I know there is something wrong with the ina and inb part, but I cannot figure out how to fix it.

The log file shows that the next dataset in the loop keeps replacing the previous merged dataset, instead of adding into it.

Here is the log for iterations 2007Q4 and 2008Q1. Note how the total observations has decreased in data.Merged.

NOTE: There were 1687 observations read from the data set DATA.COHORT.
NOTE: There were 712195 observations read from the data set IN1PERC.2007Q4.
NOTE: The data set DATA.MERGED has 7185 observations and 17 variables.

NOTE: There were 1687 observations read from the data set DATA.COHORT.
NOTE: There were 731171 observations read from the data set IN1PERC.2008Q1.
NOTE: The data set DATA.MERGED has 7031 observations and 17 variables.

How do I fix this? Thank you for your help!

Astounding · Posted 06-21-2018 06:53 AM

If I'm interpreting the question correctly, you would need to make this sort of change. First, delete your data.merged data set. Then change the last part of your loop where you now have:

 data data.merged;
 merge data.cohort (in=ina) in1perc.&yr.q&qr. (in=inb);
 by patid; if ina and inb; 
run;

Instead, replace that section with:


data merged;
merge data.cohort (in=ina)
      in1perc.&yr.q&qr. (in=inb);
      by patid;
      if ina and inb;
run;
proc append data=merged base=data.merged;
run;

You can run this, even when DATA.MERGED does not yet exist. For that case, MERGED will get copied to DATA.MERGED.

View solution in original post

Kurt_Bremser · Posted 06-21-2018 03:16 AM

Instead of creating data.merged in the merge step, create work.merged, and then append it to data.merged within the macro loop.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RW9 · Posted 06-21-2018 04:05 AM

I haven't tested this as in a meeting, however maybe something like - needs all datasets in one library:

data merged;
  merge in1perc.ses_diag:;
  by patid;
  if ina and inb;
run;

Note the : after the prefix part, normally it takes anything with the prefix, but not sure about merge.

Question is why do you have lots of same data in dated files in the first place? Every time when I see this I immediately think that the process before could be improved to give one dataset with date in the data - makes everyones life so much easier.

Astounding · Posted 06-21-2018 06:53 AM

If I'm interpreting the question correctly, you would need to make this sort of change. First, delete your data.merged data set. Then change the last part of your loop where you now have:

 data data.merged;
 merge data.cohort (in=ina) in1perc.&yr.q&qr. (in=inb);
 by patid; if ina and inb; 
run;

Instead, replace that section with:


data merged;
merge data.cohort (in=ina)
      in1perc.&yr.q&qr. (in=inb);
      by patid;
      if ina and inb;
run;
proc append data=merged base=data.merged;
run;

You can run this, even when DATA.MERGED does not yet exist. For that case, MERGED will get copied to DATA.MERGED.

ballardw · Posted 06-21-2018 11:06 AM

@zzfsimona wrote:

Hi there,

I am using SAS 9.4.

I have multiple datasets titled as 2007q1, 2007q2, 2007q3, ..., 2017q3, 2017q4. I also have a dataset titled Cohort. All dataset contain the variable patid.

I want to merge all those &yr.q&qr. datasets with the dataset Cohort by the common variable patid. There should be only 1 final merged dataset. The merged file should contain all and only patids that are in the dataset Cohort.

Here is my code:
%macro mergedata;
	proc sort data=data.cohort; by patid; run;
	
	%do yr=2007 %to 2017;
		%do qr=1 %to 4;
			proc sort data=in1perc.ses_diag&yr.q&qr.; 
				by patid; 
			run;
			data data.merged;
				merge
				data.cohort (in=ina)
				in1perc.&yr.q&qr. (in=inb);
				by patid;
				if ina and inb;
			run;
		%end;
	%end;
%mend;

%mergedata
I know there is something wrong with the ina and inb part, but I cannot figure out how to fix it.

The log file shows that the next dataset in the loop keeps replacing the previous merged dataset, instead of adding into it.

Here is the log for iterations 2007Q4 and 2008Q1. Note how the total observations has decreased in data.Merged.

These two lines don't match:

proc sort data=in1perc.ses_diag&yr.q&qr.;

in1perc.&yr.q&qr. (in=inb)

did you mean to use the sored data? as the data set names are not the same.

I suggest that you do with two data sets manually and see if the results are as desired. I suspect not as MERGE is going to replace like-named variable values.

I suspect that what you actually want to do is append all of the 2007q1, 2007q2 etc sets together and then select.

Some thing like

data work.temp;
   set  in1perc.ses_diag: ;
run;

proc sql;
   create table data.merged as
   select b.*
   from data.cohort as a
        left join
        work.temp
        on a.patid=b.patid
   ;
quit;

No claims for efficiency except in short code if this actually accomplishes what you want.

The set in1perc.ses_diag: ; will append all data sets in the in1perc library that start with ses_diag.

Merge multiple datasets with one dataset using macro

Re: Merge multiple datasets with one dataset using macro

Re: Merge multiple datasets with one dataset using macro

Re: Merge multiple datasets with one dataset using macro

Re: Merge multiple datasets with one dataset using macro

Re: Merge multiple datasets with one dataset using macro

Merge multiple datasets with one dataset using macro

Re: Merge multiple datasets with one dataset using macro

Re: Merge multiple datasets with one dataset using macro

Re: Merge multiple datasets with one dataset using macro

Re: Merge multiple datasets with one dataset using macro

Re: Merge multiple datasets with one dataset using macro

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away