Solved: Re: How to combine three datasets with same variables in SAS?

ph6 · Posted 06-07-2020 12:50 PM

Hi, I have three datasets,

data1 
A   B  C  D
1   8  7  9
2   0  .  .
3   .  .  .

data2
A   B  C   D
1   8  .   .
2   0  5   0
3   .  .   .

data3
A   B  C  D
1   8  .  .
2   0  .  .
3   .  7  4

I need to combine these three datasets keeping A, B the same and combining C and D variables.

All 
A   B  C  D
1   8  7  9
2   0  5  0
3   .  7  4

I am using merge, but it does not produce what I want:

data all;
 merge data1 data2 data3;
run;

Any ideas how can I perform this action?

Tom · Posted 06-07-2020 01:26 PM

Use the UPDATE statement instead of the MERGE statement. That is designed for transactions so missing values do not change the value.

It needs two datasets to work. The first is the source and the second are the transactions. You could either do it in two steps.

data want;
   update data1 date2;
   by A;
run;
data want;
   update want date3;
   by A;
run;

Or first combine the datasets into one and then use that as both the source and transaction datasets. To save temporary disk space and perhaps time you could use a datastep view to do the combining.

data all / view=all;
   set data1-data3;
   by A;
run;
data want ;
   update all(obs=0) all;
   by A;
run;

View solution in original post

Cynthia_sas · Posted 06-07-2020 01:24 PM

Hi:

This is an instance where I would probably use an UPDATE statement. With the UPDATEMODE=MISSINGCHECK option, you can make sure that nothing gets overwritten by the missings in DATA2 and DATA3.

data data1 ;
infile datalines;
input A   B  C  D;
datalines;
1   8  7  9
2   0  .  .
3   .  .  .
;
run;

data data2;
infile datalines;
input A   B  C  D;
datalines;
1   8  .   .
2   0  5   0
3   .  .   .
;
run;

data data3;
infile datalines;
input A   B  C  D;
datalines;
1   8  .  .
2   0  .  .
3   .  7  4
;
run;

** put all update data in one file;
data allupdate;
  set data2 data3;
run;
  
proc sort data=allupdate;
by a;
run;
  
data final;
  update data1 allupdate updatemode=missingcheck;
  by a;
run;

proc print data=final;
title 'After applying all updates';
run;

(modified to include by a statement in UPDATE program)

Cynthia

ph6 · Posted 06-07-2020 01:47 PM

It works, but I needed to put by a also in the update statement. Thank you!

Tom · Posted 06-07-2020 01:26 PM

Use the UPDATE statement instead of the MERGE statement. That is designed for transactions so missing values do not change the value.

It needs two datasets to work. The first is the source and the second are the transactions. You could either do it in two steps.

data want;
   update data1 date2;
   by A;
run;
data want;
   update want date3;
   by A;
run;

Or first combine the datasets into one and then use that as both the source and transaction datasets. To save temporary disk space and perhaps time you could use a datastep view to do the combining.

data all / view=all;
   set data1-data3;
   by A;
run;
data want ;
   update all(obs=0) all;
   by A;
run;

ballardw · Posted 06-07-2020 01:28 PM

Some questions:

Do the values of the combination of A and B variables have duplicate values in ANY of the data sets?

If so you will need to provide some actual rules of which records get which results.

Second, if the variables C and D have values in more than one set which particular data set's value should be kept?

Such as with the following, what would be the rule to keep which value (NOT and example, the RULE for selecting)

How to combine three datasets with same variables in SAS?

Re: How to combine three datasets with same variables in SAS?

Re: How to combine three datasets with same variables in SAS?

Re: How to combine three datasets with same variables in SAS?

Re: How to combine three datasets with same variables in SAS?

Re: How to combine three datasets with same variables in SAS?

Registration is open

SAS Training: Just a Click Away