Dear all,
I have a problem which is giving me headache. I will appreciate any help.
I have this dataset
data sampledata2011;
infile datalines missover;
input var1_2011 $ var2_2011 $ var3_2011 $ var4_2011 $ var5_2011 $ var5name_2011 $;
datalines;
01 0 58 5811 ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521 CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632 DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
run;
I need to merge it with this dataset
data sampledata2012;
infile datalines missover;
input var1_2012 $ var2_2012 $ var3_2012 $ var4_2012 $ var5_2012 $ var5name_2012 $;
datalines;
01 0 58 5811 NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521 CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632 JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
run;
dataset "sampledata2012" include changes which have been made to sampledata2011 like in the first row var5name has been changed from ABN to NCB. It also contains more datalines.
I want to have at the end a data that looks like this
data want;
infile datalines missover;
input var1_2011 $ var2_2011 $ var3_2011 $ var4_2011 $ var5_2011 $ var5name_2011 $
var1_2012 $ var2_2012 $ var3_2012 $ var4_2012 $ var5_2012 $ var5name_2012 $;
datalines;
01 0 58 5811 ABN 01 0 58 5811 NCB
01 0 58 5811 007 ABS 01 0 58 5811 007 ABS
01 0 58 5811 123 ABE 01 0 58 5811 123 ABE
01 0 58 5811 459 ABS 01 0 58 5811 459 ABS
01 0 58 5811 326 ABE 01 0 58 5811 326 ABE
01 0 02 6521 CGM 01 0 02 6521 CGM
01 0 02 6521 001 CGN 01 0 02 6521 001 CGN
01 0 02 6521 822 CGM 01 0 02 6521 822 CGM
01 0 02 6521 445 CGR 01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632 DFJ 02 1 03 6632 JBB
02 1 03 6632 231 DFK 02 1 03 6632 231 DFK
02 1 03 6632 553 DFL 02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
run;
so that one can see the changes in each row from year 2011 to 2012. And also see which new data has been included.
sorry my missover is not working properly. I hope you can still help.
@Anita_n Below what you're asking for. I believe that not having the year in your variable name but adding a year variable instead would make things easier. And once you've just got variable names that are the same in all years don't merge the yearly data but append it. Often a long and narrow table structure is easier to work with.
data sampledata2011;
length hash_key $32;
infile datalines truncover;
input
@1 var1_2011 $2.
@4 var2_2011 $1.
@6 var3_2011 $2.
@9 var4_2011 $4.
@14 var5_2011 $3.
@18 var5name_2011 $3.
;
hash_key=put(md5(catx('|', of var1_2011--var5_2011)),$hex32.);
datalines;
01 0 58 5811 ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521 CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632 DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
;
data sampledata2012;
length hash_key $32;
infile datalines truncover;
input
@1 var1_2012 $2.
@4 var2_2012 $1.
@6 var3_2012 $2.
@9 var4_2012 $4.
@14 var5_2012 $3.
@18 var5name_2012 $3.
;
hash_key=put(md5(catx('|', of var1_2012--var5_2012)),$hex32.);
datalines;
01 0 58 5811 NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521 CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632 JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
;
proc sort data=sampledata2011;
by hash_key;
run;
proc sort data=sampledata2012;
by hash_key;
run;
data want;
merge sampledata2011 sampledata2012;
by hash_key;
run;
Hi @Anita_n
First - what is the key that should be used to join the rows? - the full set of variables Var1-Var5?
If so, then why do you want to have the variables from both sides in the output? Would it not be better to have one set of the variables Var1-Var5 + Var5name_2011, Var5name_2012 and following years?
Given these assumptions are correct, the following code should give the wanted result:
data sampledata2011;
infile datalines missover;
input var1 $ var2 $ var3 $ var4 $ @14 var5 $char3. @18 var5name_2011 $;
datalines;
01 0 58 5811 ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521 CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632 DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
run;
data sampledata2012;
infile datalines missover;
input var1 $ var2 $ var3 $ var4 $ @14 var5 $char3. @18 var5name_2012 $;
datalines;
01 0 58 5811 NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521 CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632 JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
run;
proc sort data=sampledata2011;
by var1-var5;
run;
proc sort data=sampledata2012;
by var1-var5;
run;
data want;
merge sampledata2011 sampledata2012;
by var1-var5;
run;
@ErikLund_Jensen thanks for the reply. I need to have variables from both sides because there can be changes each year in each of the variables
Anita, this doesn't make sense.
If var5name_2012 is not missing, you know the VAR1-VAR5 entries on the line are in the 2012 file. No need to keep 2 copies.
@Anita_n Below what you're asking for. I believe that not having the year in your variable name but adding a year variable instead would make things easier. And once you've just got variable names that are the same in all years don't merge the yearly data but append it. Often a long and narrow table structure is easier to work with.
data sampledata2011;
length hash_key $32;
infile datalines truncover;
input
@1 var1_2011 $2.
@4 var2_2011 $1.
@6 var3_2011 $2.
@9 var4_2011 $4.
@14 var5_2011 $3.
@18 var5name_2011 $3.
;
hash_key=put(md5(catx('|', of var1_2011--var5_2011)),$hex32.);
datalines;
01 0 58 5811 ABN
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521 CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
02 1 03 6632 DFJ
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
;
data sampledata2012;
length hash_key $32;
infile datalines truncover;
input
@1 var1_2012 $2.
@4 var2_2012 $1.
@6 var3_2012 $2.
@9 var4_2012 $4.
@14 var5_2012 $3.
@18 var5name_2012 $3.
;
hash_key=put(md5(catx('|', of var1_2012--var5_2012)),$hex32.);
datalines;
01 0 58 5811 NCB
01 0 58 5811 007 ABS
01 0 58 5811 123 ABE
01 0 58 5811 459 ABS
01 0 58 5811 326 ABE
01 0 02 6521 CGM
01 0 02 6521 001 CGN
01 0 02 6521 822 CGM
01 0 02 6521 445 CGR
01 0 02 6521 235 DRR
01 0 02 6521 411 MMS
02 1 03 6632 JBB
02 1 03 6632 231 DFK
02 1 03 6632 553 DFL
02 1 03 6632 111 KKK
02 1 03 6632 332 BNM
;
proc sort data=sampledata2011;
by hash_key;
run;
proc sort data=sampledata2012;
by hash_key;
run;
data want;
merge sampledata2011 sampledata2012;
by hash_key;
run;
@Patrick : Hello Patrick, that is wonderful, I just tried your code, it works perfectly. Thanks a lot 🙂
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.