BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Anita_n
Pyrite | Level 9

Dear all, 

I have a problem which is giving me headache. I will appreciate any help.

I have this dataset

data sampledata2011;
infile datalines missover;
input var1_2011 $ var2_2011 $ var3_2011 $ var4_2011 $ var5_2011 $ var5name_2011 $;
datalines; 
01 0 58 5811     ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632     DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
run;

I need to merge it with this dataset

data sampledata2012;
infile datalines missover;
input var1_2012 $ var2_2012 $ var3_2012 $ var4_2012 $ var5_2012 $ var5name_2012 $;
datalines; 
01 0 58 5811     NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632     JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
run;

dataset "sampledata2012" include changes which have been made to sampledata2011 like in the first row var5name has been changed from ABN to NCB. It also contains more datalines. 

I want to have at the end a data that looks like this

 

data want;
infile datalines missover;
input var1_2011 $ var2_2011 $ var3_2011 $ var4_2011 $ var5_2011 $ var5name_2011 $
var1_2012 $ var2_2012 $ var3_2012 $ var4_2012 $ var5_2012 $ var5name_2012 $;
datalines; 
01 0 58 5811     ABN 01 0 58 5811     NCB 
01 0 58 5811 007 ABS 01 0 58 5811 007 ABS 
01 0 58 5811 123 ABE 01 0 58 5811 123 ABE 
01 0 58 5811 459 ABS 01 0 58 5811 459 ABS 
01 0 58 5811 326 ABE 01 0 58 5811 326 ABE 
01 0 02 6521     CGM 01 0 02 6521     CGM 
01 0 02 6521 001 CGN 01 0 02 6521 001 CGN 
01 0 02 6521 822 CGM 01 0 02 6521 822 CGM 
01 0 02 6521 445 CGR 01 0 02 6521 445 CGR 
                     01 0 02 6521 235 DRR 
                     01 0 02 6521 411 MMS 
02 1 03 6632     DFJ 02 1 03 6632     JBB 
02 1 03 6632 231 DFK 02 1 03 6632 231 DFK 
02 1 03 6632 553 DFL 02 1 03 6632 553 DFL 
                     02 1 03 6632 111 KKK 
                     02 1 03 6632 332 BNM 

run;

so that one can see the changes in each row from year 2011 to 2012. And also see which new data has been included.

sorry my missover is not working properly. I hope you can still help.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Anita_n Below what you're asking for. I believe that not having the year in your variable name but adding a year variable instead would make things easier. And once you've just got variable names that are the same in all years don't merge the yearly data but append it. Often a long and narrow table structure is easier to work with.

data sampledata2011;
  length hash_key $32;
  infile datalines truncover;
  input 
    @1  var1_2011 $2. 
    @4  var2_2011 $1. 
    @6  var3_2011 $2. 
    @9  var4_2011 $4. 
    @14 var5_2011 $3. 
    @18 var5name_2011 $3.
    ;
  hash_key=put(md5(catx('|', of var1_2011--var5_2011)),$hex32.);
  datalines;
01 0 58 5811     ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632     DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
;

data sampledata2012;
  length hash_key $32;
  infile datalines truncover;
  input 
    @1  var1_2012 $2. 
    @4  var2_2012 $1. 
    @6  var3_2012 $2. 
    @9  var4_2012 $4. 
    @14 var5_2012 $3. 
    @18 var5name_2012 $3.
    ;  
  hash_key=put(md5(catx('|', of var1_2012--var5_2012)),$hex32.);
  datalines;
01 0 58 5811     NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632     JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
;

proc sort data=sampledata2011;
  by hash_key;
run;

proc sort data=sampledata2012;
  by hash_key;
run;

data want;
  merge sampledata2011 sampledata2012;
  by hash_key;
run;

View solution in original post

5 REPLIES 5
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Anita_n 

 

First - what is the key that should be used to join the rows? - the full set of variables Var1-Var5?

 

If so, then why do you want to have the variables from both sides in the output? Would it not be better to have one set of the variables Var1-Var5 + Var5name_2011, Var5name_2012 and following years? 

 

Given these assumptions are correct, the following code should give the wanted result:

 

data sampledata2011;
infile datalines missover;
input var1 $ var2 $ var3 $ var4 $ @14 var5 $char3. @18 var5name_2011 $;
datalines; 
01 0 58 5811     ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632     DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
run;


data sampledata2012;
infile datalines missover;
input var1 $ var2 $ var3 $ var4 $ @14 var5 $char3. @18 var5name_2012 $;
datalines; 
01 0 58 5811     NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632     JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
run;

proc sort data=sampledata2011; 
	by var1-var5;
run;

proc sort data=sampledata2012; 
	by var1-var5;
run;

data want;
	merge sampledata2011 sampledata2012;
	by var1-var5;
run;
Anita_n
Pyrite | Level 9

@ErikLund_Jensen thanks for the reply. I need to have variables from both sides because there can be changes each year in each of the variables

ChrisNZ
Tourmaline | Level 20

Anita, this doesn't make sense. 

If   var5name_2012    is not missing, you know the VAR1-VAR5 entries on the line are in the 2012 file. No need to keep 2 copies.

Patrick
Opal | Level 21

@Anita_n Below what you're asking for. I believe that not having the year in your variable name but adding a year variable instead would make things easier. And once you've just got variable names that are the same in all years don't merge the yearly data but append it. Often a long and narrow table structure is easier to work with.

data sampledata2011;
  length hash_key $32;
  infile datalines truncover;
  input 
    @1  var1_2011 $2. 
    @4  var2_2011 $1. 
    @6  var3_2011 $2. 
    @9  var4_2011 $4. 
    @14 var5_2011 $3. 
    @18 var5name_2011 $3.
    ;
  hash_key=put(md5(catx('|', of var1_2011--var5_2011)),$hex32.);
  datalines;
01 0 58 5811     ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632     DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
;

data sampledata2012;
  length hash_key $32;
  infile datalines truncover;
  input 
    @1  var1_2012 $2. 
    @4  var2_2012 $1. 
    @6  var3_2012 $2. 
    @9  var4_2012 $4. 
    @14 var5_2012 $3. 
    @18 var5name_2012 $3.
    ;  
  hash_key=put(md5(catx('|', of var1_2012--var5_2012)),$hex32.);
  datalines;
01 0 58 5811     NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521     CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632     JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
;

proc sort data=sampledata2011;
  by hash_key;
run;

proc sort data=sampledata2012;
  by hash_key;
run;

data want;
  merge sampledata2011 sampledata2012;
  by hash_key;
run;
Anita_n
Pyrite | Level 9

@Patrick : Hello Patrick, that is wonderful, I just tried your code, it works perfectly. Thanks a lot 🙂

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 678 views
  • 1 like
  • 4 in conversation