Question about merging data sets in SAS Studio

mcmaxwell · Posted 12-18-2020 11:14 AM

Hello, I am new to SAS Studio and would appreciate any help/information!

For my research I have many different CSV files that contain different variables. Below is an example of the code I might use to merge these files:

/** outer full merge the data **/ 
data new; 
	update gb20 (in=left) PLTL (in=right); 
	by SUID; 
	run;

/** left merge the data **/ 
data new; 
	update gb20 (in=left) PLTL (in=right); 
	by SUID;
	if left;
	run;

/** right merge the data **/ 
data new; 
	update gb20 (in=left) PLTL (in=right); 
	by SUID;
	if right;
	run;

/** inner merge the data **/ 
data new; 
	update gb20 (in=left) PLTL (in=right); 
	by SUID;
	if left and right;
	run;

Where "gb20" and "PLTL" are the names of the data sets, "new" is the name of the newly merged data set, and "SUID" is the key.

Now, let's say there is a third data set called "demo" that I want to merge with "gb20" and "PLTL" as well. Is there a way that I can merge all three data sets in one step? Or, do I need to add them together one by one? (See example below)

/** outer full merge the data again **/  
data full;  
	update demo (in=left) new (in=right);  
	by SUID;  
	run;

Thanks in advance!

ballardw · Posted 12-18-2020 11:37 AM

@mcmaxwell wrote:

Hello, I am new to SAS Studio and would appreciate any help/information!

For my research I have many different CSV files that contain different variables. Below is an example of the code I might use to merge these files:
/** outer full merge the data **/ 
data new; 
	update gb20 (in=left) PLTL (in=right); 
	by SUID; 
	run;

/** left merge the data **/ 
data new; 
	update gb20 (in=left) PLTL (in=right); 
	by SUID;
	if left;
	run;

/** right merge the data **/ 
data new; 
	update gb20 (in=left) PLTL (in=right); 
	by SUID;
	if right;
	run;

/** inner merge the data **/ 
data new; 
	update gb20 (in=left) PLTL (in=right); 
	by SUID;
	if left and right;
	run;
Where "gb20" and "PLTL" are the names of the data sets, "new" is the name of the newly merged data set, and "SUID" is the key.

Now, let's say there is a third data set called "demo" that I want to merge with "gb20" and "PLTL" as well. Is there a way that I can merge all three data sets in one step? Or, do I need to add them together one by one? (See example below)
/** outer full merge the data again **/  
data full;  
	update demo (in=left) new (in=right);  
	by SUID;  
	run;
Thanks in advance!

The workable approach is dependent to a fair extent on the data structure and content.

UPDATE is not the same as MERGE or MODIFY.

Update if you have multiple records with the same BY variables in the updating set (the second one) will apply all the updates to the single record in the master data set. MERGE if you have multiple records with the same by values in the second set will have multiple records.

Example:

data master;
   input x y;
datalines;
1 11
2 22
3 33
;

data transaction;
   input x y;
datalines;
1 5
1 6
1 7
3 16
3 8
;

data updated;
   update master transaction;
   by x;
run;

data merged;
  merge master transaction;
  by x;
run;

Update only allows a master and transaction data set. Also the behavior of missing values in the transaction data set can be modified in Update but not merge. Modify is somewhat similar to Update with additional features.

Also the order of multiple data sets on a merge statement is important. Adding to the above:

data other;
   input x y z;
datalines;
1 111 345
2 222 333
2 333 444
2 .   345
;

data merged2;
  merge master transaction other;
  by x;
run;

data merged3;
  merge master other transaction;
  by x;
run;

Check the value of Y in the first record of Merged2 and Merged3 as an example.

So whether a merge would accomplish what you want depends on contents of the data sets.

Tom · Posted 12-18-2020 11:44 AM

Note that SAS/Studio is a user interface to allow you to run SAS code. Your question is about SAS the language itself and not about the SAS/Studio interface.

The data step MERGE will NOT handle many to many joins the same way as tools like SQL will. When you have more than one record per BY group the records from the contributing datasets are matched by the order they appear. Once one of the source datasets runs out of records for a BY group then values of variables contributed only from that dataset are just left as they were for the last record read for the group from that source dataset.

That said to merge multiple dataset just list them all on the MERGE statement. As many as your want.

data want;
  merge ds1 ds2 ds3 ;
  by id;
run;

Note to stack the data (think SQL UNION) use a SET statement instead of MERGE statement.

data want;
  set ds1 ds2 ds3 ;
run;

If you add a BY statement the the records from the various source datasets will be interleaved.

data want;
  set ds1 ds2 ds3 ;
  by id;
run;

Question about merging data sets in SAS Studio

Re: Question about merging data sets in SAS Studio

Re: Question about merging data sets in SAS Studio

Question about merging data sets in SAS Studio

Re: Question about merging data sets in SAS Studio

Re: Question about merging data sets in SAS Studio

SAS Innovate 2025: Register Now