BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Anita_n
Pyrite | Level 9

Dear all, 

I have a problem which is giving me headache. I will appreciate any help.

I have this dataset

data sampledata2011;
infile datalines missover;
input var1_2011 $ var2_2011 $ var3_2011 $ var4_2011 $ var5_2011 $ var5name_2011 $;
datalines; 
01 0 58 5811     ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632     DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
run;

I need to merge it with this dataset

data sampledata2012;
infile datalines missover;
input var1_2012 $ var2_2012 $ var3_2012 $ var4_2012 $ var5_2012 $ var5name_2012 $;
datalines; 
01 0 58 5811     NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632     JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
run;

dataset "sampledata2012" include changes which have been made to sampledata2011 like in the first row var5name has been changed from ABN to NCB. It also contains more datalines. 

I want to have at the end a data that looks like this

 

data want;
infile datalines missover;
input var1_2011 $ var2_2011 $ var3_2011 $ var4_2011 $ var5_2011 $ var5name_2011 $
var1_2012 $ var2_2012 $ var3_2012 $ var4_2012 $ var5_2012 $ var5name_2012 $;
datalines; 
01 0 58 5811     ABN 01 0 58 5811     NCB 
01 0 58 5811 007 ABS 01 0 58 5811 007 ABS 
01 0 58 5811 123 ABE 01 0 58 5811 123 ABE 
01 0 58 5811 459 ABS 01 0 58 5811 459 ABS 
01 0 58 5811 326 ABE 01 0 58 5811 326 ABE 
01 0 02 6521     CGM 01 0 02 6521     CGM 
01 0 02 6521 001 CGN 01 0 02 6521 001 CGN 
01 0 02 6521 822 CGM 01 0 02 6521 822 CGM 
01 0 02 6521 445 CGR 01 0 02 6521 445 CGR 
                     01 0 02 6521 235 DRR 
                     01 0 02 6521 411 MMS 
02 1 03 6632     DFJ 02 1 03 6632     JBB 
02 1 03 6632 231 DFK 02 1 03 6632 231 DFK 
02 1 03 6632 553 DFL 02 1 03 6632 553 DFL 
                     02 1 03 6632 111 KKK 
                     02 1 03 6632 332 BNM 

run;

so that one can see the changes in each row from year 2011 to 2012. And also see which new data has been included.

sorry my missover is not working properly. I hope you can still help.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Anita_n Below what you're asking for. I believe that not having the year in your variable name but adding a year variable instead would make things easier. And once you've just got variable names that are the same in all years don't merge the yearly data but append it. Often a long and narrow table structure is easier to work with.

data sampledata2011;
  length hash_key $32;
  infile datalines truncover;
  input 
    @1  var1_2011 $2. 
    @4  var2_2011 $1. 
    @6  var3_2011 $2. 
    @9  var4_2011 $4. 
    @14 var5_2011 $3. 
    @18 var5name_2011 $3.
    ;
  hash_key=put(md5(catx('|', of var1_2011--var5_2011)),$hex32.);
  datalines;
01 0 58 5811     ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632     DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
;

data sampledata2012;
  length hash_key $32;
  infile datalines truncover;
  input 
    @1  var1_2012 $2. 
    @4  var2_2012 $1. 
    @6  var3_2012 $2. 
    @9  var4_2012 $4. 
    @14 var5_2012 $3. 
    @18 var5name_2012 $3.
    ;  
  hash_key=put(md5(catx('|', of var1_2012--var5_2012)),$hex32.);
  datalines;
01 0 58 5811     NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632     JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
;

proc sort data=sampledata2011;
  by hash_key;
run;

proc sort data=sampledata2012;
  by hash_key;
run;

data want;
  merge sampledata2011 sampledata2012;
  by hash_key;
run;

View solution in original post

5 REPLIES 5
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Anita_n 

 

First - what is the key that should be used to join the rows? - the full set of variables Var1-Var5?

 

If so, then why do you want to have the variables from both sides in the output? Would it not be better to have one set of the variables Var1-Var5 + Var5name_2011, Var5name_2012 and following years? 

 

Given these assumptions are correct, the following code should give the wanted result:

 

data sampledata2011;
infile datalines missover;
input var1 $ var2 $ var3 $ var4 $ @14 var5 $char3. @18 var5name_2011 $;
datalines; 
01 0 58 5811     ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632     DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
run;


data sampledata2012;
infile datalines missover;
input var1 $ var2 $ var3 $ var4 $ @14 var5 $char3. @18 var5name_2012 $;
datalines; 
01 0 58 5811     NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632     JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
run;

proc sort data=sampledata2011; 
	by var1-var5;
run;

proc sort data=sampledata2012; 
	by var1-var5;
run;

data want;
	merge sampledata2011 sampledata2012;
	by var1-var5;
run;
Anita_n
Pyrite | Level 9

@ErikLund_Jensen thanks for the reply. I need to have variables from both sides because there can be changes each year in each of the variables

ChrisNZ
Tourmaline | Level 20

Anita, this doesn't make sense. 

If   var5name_2012    is not missing, you know the VAR1-VAR5 entries on the line are in the 2012 file. No need to keep 2 copies.

Patrick
Opal | Level 21

@Anita_n Below what you're asking for. I believe that not having the year in your variable name but adding a year variable instead would make things easier. And once you've just got variable names that are the same in all years don't merge the yearly data but append it. Often a long and narrow table structure is easier to work with.

data sampledata2011;
  length hash_key $32;
  infile datalines truncover;
  input 
    @1  var1_2011 $2. 
    @4  var2_2011 $1. 
    @6  var3_2011 $2. 
    @9  var4_2011 $4. 
    @14 var5_2011 $3. 
    @18 var5name_2011 $3.
    ;
  hash_key=put(md5(catx('|', of var1_2011--var5_2011)),$hex32.);
  datalines;
01 0 58 5811     ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632     DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
;

data sampledata2012;
  length hash_key $32;
  infile datalines truncover;
  input 
    @1  var1_2012 $2. 
    @4  var2_2012 $1. 
    @6  var3_2012 $2. 
    @9  var4_2012 $4. 
    @14 var5_2012 $3. 
    @18 var5name_2012 $3.
    ;  
  hash_key=put(md5(catx('|', of var1_2012--var5_2012)),$hex32.);
  datalines;
01 0 58 5811     NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632     JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
;

proc sort data=sampledata2011;
  by hash_key;
run;

proc sort data=sampledata2012;
  by hash_key;
run;

data want;
  merge sampledata2011 sampledata2012;
  by hash_key;
run;
Anita_n
Pyrite | Level 9

@Patrick : Hello Patrick, that is wonderful, I just tried your code, it works perfectly. Thanks a lot 🙂

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1024 views
  • 1 like
  • 4 in conversation