Hi SAS users,
I would like to combine (!) two files having the same variables but with different values, as below:
Dataset1
Variable1 Variable2 Variable3
S A .
Dataset2
Variable1 Variable2 Variable3
S . B
S . C
and I would like to get this:
Variable1 Variable2 Variable3
S A B
S A C
I tried MERGE BY statement for Variable1 but the result is not what I want. Because it gives me:
Variable1 Variable2 Variable3
S . B
S . C
S A .
Thank you for help.
Post the code you used.
You can probably try an update statement.
I used just a simple Merge statement:
data dataset3;
merge dataset1 dataset2;
by Variable1;
run;
Kurt directed you to the PDV. I emphasize this when I teach intro classes. In the very least, the order of your data sets in the MERGE statements is wrong: dataset2.VARIABLE2 will overwrite dataset1.VARIABLE1. As in the SAS-L era, consider posting a description of your data and your goals.
When we need to assume, then we may speculate wildly.
HTH,
Kevin
When doing a MERGE, care must be taken with variables that share the same name. Which value takes precedence when the PDV is filled depends on the order in whcih records are read.
I recommend renaming the variables of one dataset and include code that forces correct assignments to the wanted variables in the output dataset.
It could be MERGE two datasets.
data Dataset1;
input Variable1 $ Variable2 $ Variable3 $;
cards;
S A .
;
data Dataset2;
input Variable1 $ Variable2 $ Variable3 $;
cards;
S . B
S . C
;
run;
data want;
merge Dataset1 Dataset2(rename=(Variable1-Variable3=v1-v3));
length new1-new3 temp1-temp3 $ 40;
retain temp1-temp3;
new1=coalescec(Variable1,v1) ;
if missing(new1) then new1=temp1;
else temp1=new1;
new2=coalescec(Variable2,v2) ;
if missing(new2) then new2=temp2;
else temp2=new2;
new3=coalescec(Variable3,v3) ;
if missing(new3) then new3=temp3;
else temp3=new3;
keep new1-new3;
run;
It would be better to have more details. A MERGE, especially with a RENAME or KEEP data set option, is just one of the approaches. We do not know the sizes of the datasets, the number of replicates, or the rules. Given that, I might suggest a HASH because the data set(s) might not be sorted.
data dataset1 ;
variable1 = "S" ;
variable2 = "A" ;
variable3 = " " ;
run ;
data dataset2 ;
variable1 = "S" ;
variable2 = " " ;
variable3 = "B" ;
output ;
variable1 = "S" ;
variable2 = " " ;
variable3 = "C" ;
output ;
run ;
data dataset3
( drop = __: )
;
if 0 then set dataset1 ;
if _n_ = 1
then
do ;
declare hash _S ( dataset: "dataset1" ) ;
__rc = _S.DefineKey ( "variable1" ) ;
__rc = _S.DefineData( "variable2" ) ;
__rc = _S.DefineDone() ;
end ;
set dataset2 ;
__rc = _S.find( ) ;
run ;
Of course, we do not know if VARIABLE3 needs to be updated if missing. Depending on the complexity of the rules (and data) another HASH can be defined or the KEY/VALUEs can be added using ADD or REF methods as needed.
HTH,
Kevin
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.