DATA Step, Macro, Functions and more

How to combine (not merge) two data sets?

Reply
Contributor
Posts: 72

How to combine (not merge) two data sets?

[ Edited ]

Hi SAS users,

 

I would like to combine (!) two files having the same variables but with different values, as below:

 

Dataset1

 

Variable1    Variable2    Variable3 

    S                  A                 .         

 

 

 

Dataset2

 

Variable1    Variable2    Variable3 

    S                    .                  B

    S                    .                  C

 

 

 

and I would like to get this:

 

   

Variable1    Variable2    Variable3 

    S                    A                B

    S                    A                C

 

 

I tried MERGE BY statement for Variable1 but the result is not what I want. Because it gives me:

 

Variable1    Variable2    Variable3 

    S                    .                  B

    S                    .                  C

    S                    A                 .

 

 

Thank you for help.

  

Super User
Posts: 17,775

Re: How to combine (not merge) two data sets?

Post the code you used. 

You can probably try an update statement. 

Contributor
Posts: 72

Re: How to combine (not merge) two data sets?

I used just a simple Merge statement:

 

data dataset3;

merge dataset1 dataset2;

by Variable1;

run;

Frequent Contributor
Posts: 79

Re: How to combine (not merge) two data sets?

Kurt directed you to the PDV.  I emphasize this when I teach intro classes.  In the very least, the order of your data sets in the MERGE statements is wrong: dataset2.VARIABLE2 will overwrite dataset1.VARIABLE1.  As in the SAS-L era, consider posting a description of your data and your goals.

 

When we need to assume, then we may speculate wildly.

 

HTH,

 

Kevin

Super User
Posts: 6,928

Re: How to combine (not merge) two data sets?

When doing a MERGE, care must be taken with variables that share the same name. Which value takes precedence when the PDV is filled depends on the order in whcih records are read.

I recommend renaming the variables of one dataset and include code that forces correct assignments to the wanted variables in the output dataset.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 5,255

Re: How to combine (not merge) two data sets?

This is a perfect and a simple job for an SQL join. Especially if your real data is more complex than your sample.
Data never sleeps
Super User
Posts: 9,671

Re: How to combine (not merge) two data sets?

It could be MERGE two datasets.

 

 

data Dataset1;
 input Variable1 $   Variable2 $   Variable3 $;
 cards; 
    S                  A                 .       
;  
data Dataset2;
 input Variable1  $  Variable2   $ Variable3 $;
 cards; 
    S               .                  B
    S                    .                  C
;
run;

data want;
 merge Dataset1 Dataset2(rename=(Variable1-Variable3=v1-v3));
 length new1-new3 temp1-temp3 $ 40;
 retain temp1-temp3;
 new1=coalescec(Variable1,v1) ;
 if missing(new1) then new1=temp1;
  else temp1=new1;
  
 new2=coalescec(Variable2,v2) ;
 if missing(new2) then new2=temp2;
  else temp2=new2;
  
 new3=coalescec(Variable3,v3) ;
 if missing(new3) then new3=temp3;
  else temp3=new3;
 keep new1-new3;
run;
Frequent Contributor
Posts: 79

Re: How to combine (not merge) two data sets?

It would be better to have more details.  A MERGE, especially with a RENAME or KEEP data set option, is just one of the approaches.  We do not know the sizes of the datasets, the number of replicates, or the rules.  Given that, I might suggest a HASH because the data set(s) might not be sorted.  

 

data dataset1 ;
  variable1 = "S" ;
  variable2 = "A" ;
  variable3 = " " ;
run ;

 

data dataset2 ;
  variable1 = "S" ;
  variable2 = " " ;
  variable3 = "B" ;
  output ;
  variable1 = "S" ;
  variable2 = " " ;
  variable3 = "C" ;
  output ;
run ;

 

data dataset3
        ( drop = __: )
     ;

if 0 then set dataset1 ;

if _n_ = 1
then
  do ;
     declare hash _S ( dataset: "dataset1" ) ;
     __rc = _S.DefineKey ( "variable1" ) ;
     __rc = _S.DefineData( "variable2" ) ;
     __rc = _S.DefineDone() ;
  end ;

  set dataset2 ;

  __rc = _S.find( ) ;

run ;

 

Of course, we do not know if VARIABLE3 needs to be updated if missing.  Depending on the complexity of the rules (and data) another HASH can be defined or the KEY/VALUEs can be added using ADD or REF methods as needed.

 

HTH,

 

Kevin

Ask a Question
Discussion stats
  • 7 replies
  • 309 views
  • 0 likes
  • 6 in conversation