Hi, I have three datasets,
data1
A B C D
1 8 7 9
2 0 . .
3 . . .
data2
A B C D
1 8 . .
2 0 5 0
3 . . .
data3
A B C D
1 8 . .
2 0 . .
3 . 7 4
I need to combine these three datasets keeping A, B the same and combining C and D variables.
All
A B C D
1 8 7 9
2 0 5 0
3 . 7 4
I am using merge, but it does not produce what I want:
data all; merge data1 data2 data3; run;
Any ideas how can I perform this action?
Use the UPDATE statement instead of the MERGE statement. That is designed for transactions so missing values do not change the value.
It needs two datasets to work. The first is the source and the second are the transactions. You could either do it in two steps.
data want;
update data1 date2;
by A;
run;
data want;
update want date3;
by A;
run;
Or first combine the datasets into one and then use that as both the source and transaction datasets. To save temporary disk space and perhaps time you could use a datastep view to do the combining.
data all / view=all;
set data1-data3;
by A;
run;
data want ;
update all(obs=0) all;
by A;
run;
Hi:
This is an instance where I would probably use an UPDATE statement. With the UPDATEMODE=MISSINGCHECK option, you can make sure that nothing gets overwritten by the missings in DATA2 and DATA3.
data data1 ;
infile datalines;
input A B C D;
datalines;
1 8 7 9
2 0 . .
3 . . .
;
run;
data data2;
infile datalines;
input A B C D;
datalines;
1 8 . .
2 0 5 0
3 . . .
;
run;
data data3;
infile datalines;
input A B C D;
datalines;
1 8 . .
2 0 . .
3 . 7 4
;
run;
** put all update data in one file;
data allupdate;
set data2 data3;
run;
proc sort data=allupdate;
by a;
run;
data final;
update data1 allupdate updatemode=missingcheck;
by a;
run;
proc print data=final;
title 'After applying all updates';
run;
(modified to include by a statement in UPDATE program)
Cynthia
Use the UPDATE statement instead of the MERGE statement. That is designed for transactions so missing values do not change the value.
It needs two datasets to work. The first is the source and the second are the transactions. You could either do it in two steps.
data want;
update data1 date2;
by A;
run;
data want;
update want date3;
by A;
run;
Or first combine the datasets into one and then use that as both the source and transaction datasets. To save temporary disk space and perhaps time you could use a datastep view to do the combining.
data all / view=all;
set data1-data3;
by A;
run;
data want ;
update all(obs=0) all;
by A;
run;
Some questions:
Do the values of the combination of A and B variables have duplicate values in ANY of the data sets?
If so you will need to provide some actual rules of which records get which results.
Second, if the variables C and D have values in more than one set which particular data set's value should be kept?
Such as with the following, what would be the rule to keep which value (NOT and example, the RULE for selecting)
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.