Hi SAS users,
I would like to combine (!) two files having the same variables but with different values, as below:
Dataset1
Variable1 Variable2 Variable3
S A .
Dataset2
Variable1 Variable2 Variable3
S . B
S . C
and I would like to get this:
Variable1 Variable2 Variable3
S A B
S A C
I tried MERGE BY statement for Variable1 but the result is not what I want. Because it gives me:
Variable1 Variable2 Variable3
S . B
S . C
S A .
Thank you for help.
Post the code you used.
You can probably try an update statement.
I used just a simple Merge statement:
data dataset3;
merge dataset1 dataset2;
by Variable1;
run;
Kurt directed you to the PDV. I emphasize this when I teach intro classes. In the very least, the order of your data sets in the MERGE statements is wrong: dataset2.VARIABLE2 will overwrite dataset1.VARIABLE1. As in the SAS-L era, consider posting a description of your data and your goals.
When we need to assume, then we may speculate wildly.
HTH,
Kevin
When doing a MERGE, care must be taken with variables that share the same name. Which value takes precedence when the PDV is filled depends on the order in whcih records are read.
I recommend renaming the variables of one dataset and include code that forces correct assignments to the wanted variables in the output dataset.
It could be MERGE two datasets.
data Dataset1;
input Variable1 $ Variable2 $ Variable3 $;
cards;
S A .
;
data Dataset2;
input Variable1 $ Variable2 $ Variable3 $;
cards;
S . B
S . C
;
run;
data want;
merge Dataset1 Dataset2(rename=(Variable1-Variable3=v1-v3));
length new1-new3 temp1-temp3 $ 40;
retain temp1-temp3;
new1=coalescec(Variable1,v1) ;
if missing(new1) then new1=temp1;
else temp1=new1;
new2=coalescec(Variable2,v2) ;
if missing(new2) then new2=temp2;
else temp2=new2;
new3=coalescec(Variable3,v3) ;
if missing(new3) then new3=temp3;
else temp3=new3;
keep new1-new3;
run;
It would be better to have more details. A MERGE, especially with a RENAME or KEEP data set option, is just one of the approaches. We do not know the sizes of the datasets, the number of replicates, or the rules. Given that, I might suggest a HASH because the data set(s) might not be sorted.
data dataset1 ;
variable1 = "S" ;
variable2 = "A" ;
variable3 = " " ;
run ;
data dataset2 ;
variable1 = "S" ;
variable2 = " " ;
variable3 = "B" ;
output ;
variable1 = "S" ;
variable2 = " " ;
variable3 = "C" ;
output ;
run ;
data dataset3
( drop = __: )
;
if 0 then set dataset1 ;
if _n_ = 1
then
do ;
declare hash _S ( dataset: "dataset1" ) ;
__rc = _S.DefineKey ( "variable1" ) ;
__rc = _S.DefineData( "variable2" ) ;
__rc = _S.DefineDone() ;
end ;
set dataset2 ;
__rc = _S.find( ) ;
run ;
Of course, we do not know if VARIABLE3 needs to be updated if missing. Depending on the complexity of the rules (and data) another HASH can be defined or the KEY/VALUEs can be added using ADD or REF methods as needed.
HTH,
Kevin
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.