Solved: Re: joining two tables

Anita_n · Posted 10-11-2021 12:26 PM

Dear all,

I have a problem which is giving me headache. I will appreciate any help.

I have this dataset

data sampledata2011;
infile datalines missover;
input var1_2011 $ var2_2011 $ var3_2011 $ var4_2011 $ var5_2011 $ var5name_2011 $;
datalines; 
01 0 58 5811     ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632     DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
run;

I need to merge it with this dataset

data sampledata2012;
infile datalines missover;
input var1_2012 $ var2_2012 $ var3_2012 $ var4_2012 $ var5_2012 $ var5name_2012 $;
datalines; 
01 0 58 5811     NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632     JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
run;

dataset "sampledata2012" include changes which have been made to sampledata2011 like in the first row var5name has been changed from ABN to NCB. It also contains more datalines.

I want to have at the end a data that looks like this

data want;
infile datalines missover;
input var1_2011 $ var2_2011 $ var3_2011 $ var4_2011 $ var5_2011 $ var5name_2011 $
var1_2012 $ var2_2012 $ var3_2012 $ var4_2012 $ var5_2012 $ var5name_2012 $;
datalines; 
01 0 58 5811     ABN 01 0 58 5811     NCB 
01 0 58 5811 007 ABS 01 0 58 5811 007 ABS 
01 0 58 5811 123 ABE 01 0 58 5811 123 ABE 
01 0 58 5811 459 ABS 01 0 58 5811 459 ABS 
01 0 58 5811 326 ABE 01 0 58 5811 326 ABE 
01 0 02 6521     CGM 01 0 02 6521     CGM 
01 0 02 6521 001 CGN 01 0 02 6521 001 CGN 
01 0 02 6521 822 CGM 01 0 02 6521 822 CGM 
01 0 02 6521 445 CGR 01 0 02 6521 445 CGR 
                     01 0 02 6521 235 DRR 
                     01 0 02 6521 411 MMS 
02 1 03 6632     DFJ 02 1 03 6632     JBB 
02 1 03 6632 231 DFK 02 1 03 6632 231 DFK 
02 1 03 6632 553 DFL 02 1 03 6632 553 DFL 
                     02 1 03 6632 111 KKK 
                     02 1 03 6632 332 BNM 

run;

so that one can see the changes in each row from year 2011 to 2012. And also see which new data has been included.

sorry my missover is not working properly. I hope you can still help.

Patrick · Posted 10-11-2021 10:38 PM

@Anita_n Below what you're asking for. I believe that not having the year in your variable name but adding a year variable instead would make things easier. And once you've just got variable names that are the same in all years don't merge the yearly data but append it. Often a long and narrow table structure is easier to work with.

data sampledata2011;
  length hash_key $32;
  infile datalines truncover;
  input 
    @1  var1_2011 $2. 
    @4  var2_2011 $1. 
    @6  var3_2011 $2. 
    @9  var4_2011 $4. 
    @14 var5_2011 $3. 
    @18 var5name_2011 $3.
    ;
  hash_key=put(md5(catx('|', of var1_2011--var5_2011)),$hex32.);
  datalines;
01 0 58 5811     ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632     DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
;

data sampledata2012;
  length hash_key $32;
  infile datalines truncover;
  input 
    @1  var1_2012 $2. 
    @4  var2_2012 $1. 
    @6  var3_2012 $2. 
    @9  var4_2012 $4. 
    @14 var5_2012 $3. 
    @18 var5name_2012 $3.
    ;  
  hash_key=put(md5(catx('|', of var1_2012--var5_2012)),$hex32.);
  datalines;
01 0 58 5811     NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632     JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
;

proc sort data=sampledata2011;
  by hash_key;
run;

proc sort data=sampledata2012;
  by hash_key;
run;

data want;
  merge sampledata2011 sampledata2012;
  by hash_key;
run;

View solution in original post

ErikLund_Jensen · Posted 10-11-2021 01:26 PM

Hi @Anita_n

First - what is the key that should be used to join the rows? - the full set of variables Var1-Var5?

If so, then why do you want to have the variables from both sides in the output? Would it not be better to have one set of the variables Var1-Var5 + Var5name_2011, Var5name_2012 and following years?

Given these assumptions are correct, the following code should give the wanted result:

data sampledata2011;
infile datalines missover;
input var1 $ var2 $ var3 $ var4 $ @14 var5 $char3. @18 var5name_2011 $;
datalines; 
01 0 58 5811     ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632     DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
run;


data sampledata2012;
infile datalines missover;
input var1 $ var2 $ var3 $ var4 $ @14 var5 $char3. @18 var5name_2012 $;
datalines; 
01 0 58 5811     NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632     JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
run;

proc sort data=sampledata2011; 
	by var1-var5;
run;

proc sort data=sampledata2012; 
	by var1-var5;
run;

data want;
	merge sampledata2011 sampledata2012;
	by var1-var5;
run;

Anita_n · Posted 10-11-2021 02:05 PM

@ErikLund_Jensen thanks for the reply. I need to have variables from both sides because there can be changes each year in each of the variables

ChrisNZ · Posted 10-11-2021 06:51 PM

Anita, this doesn't make sense.

If var5name_2012 is not missing, you know the VAR1-VAR5 entries on the line are in the 2012 file. No need to keep 2 copies.

High-Performance SAS Coding - Third Edition

Patrick · Posted 10-11-2021 10:38 PM

@Anita_n Below what you're asking for. I believe that not having the year in your variable name but adding a year variable instead would make things easier. And once you've just got variable names that are the same in all years don't merge the yearly data but append it. Often a long and narrow table structure is easier to work with.

data sampledata2011;
  length hash_key $32;
  infile datalines truncover;
  input 
    @1  var1_2011 $2. 
    @4  var2_2011 $1. 
    @6  var3_2011 $2. 
    @9  var4_2011 $4. 
    @14 var5_2011 $3. 
    @18 var5name_2011 $3.
    ;
  hash_key=put(md5(catx('|', of var1_2011--var5_2011)),$hex32.);
  datalines;
01 0 58 5811     ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632     DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
;

data sampledata2012;
  length hash_key $32;
  infile datalines truncover;
  input 
    @1  var1_2012 $2. 
    @4  var2_2012 $1. 
    @6  var3_2012 $2. 
    @9  var4_2012 $4. 
    @14 var5_2012 $3. 
    @18 var5name_2012 $3.
    ;  
  hash_key=put(md5(catx('|', of var1_2012--var5_2012)),$hex32.);
  datalines;
01 0 58 5811     NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632     JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
;

proc sort data=sampledata2011;
  by hash_key;
run;

proc sort data=sampledata2012;
  by hash_key;
run;

data want;
  merge sampledata2011 sampledata2012;
  by hash_key;
run;

Anita_n · Posted 10-12-2021 03:56 AM

@Patrick : Hello Patrick, that is wonderful, I just tried your code, it works perfectly. Thanks a lot 🙂

joining two tables

Re: joining two tables

Re: joining two tables

Re: joining two tables

Re: joining two tables

Re: joining two tables

Re: joining two tables

Register Today!

SAS Training: Just a Click Away