Dear all,
I have a question concerning how to compare two rows in SAS.
I have the following dataset. I wish to compare row1 with row2 and if var1, var2, var3 in row1 is the same as the corresponding variables in row2 then the value of the first var3name should be outputted in a new variable called var_new for all corresponding rows
data sample_data;
input var1 var2 var3 var3name $;
datalines;
01 0 01 ABN
01 0 01 ABS
01 0 01 ABE
01 0 02 CGM
01 0 02 CGN
01 0 02 CGM
01 0 02 CGR
02 1 03 DFJ
02 1 03 DFK
02 1 03 DFL
;
run;
The output should look like this below:
data want;
input var1 $ var2 $ var3 $ var3name $ var_new $;
datalines;
01 0 01 ABN ABN
01 0 01 ABS ABN
01 0 01 ABE ABN
01 0 02 CGM CGM
01 0 02 CGN CGM
01 0 02 CGM CGM
02 1 03 DFJ DFJ
02 1 03 DFK DFJ
02 1 03 DFL DFJ
;
run;
I will appreciate any help
Hello @Anita_n,
Use BY-group processing and retain variable var_new:
data want;
set sample_data;
by var1-var3;
if first.var3 then var_new=var3name;
retain var_new;
run;
You haven't specified the value of var_new in the case that, e.g., the first and second row differ in var1, var2 or var3. My suggested code treats the first row in this case as a group comprised of only one observation, i.e., the value of var3name is copied to var_new, as always in the first observation of a group.
(Note that your WANT and SAMPLE_DATA are not exactly consistent.)
Edit:
If your real variable names do not have a common prefix followed by "1", "2", "3," write them out in the BY statement, e.g.,
by red green blue;
If your real data are not sorted by var1 var2 var3, but you still want to compare only consecutive observations, add the NOTSORTED option to the BY statement:
by var1-var3 notsorted;
Hello @Anita_n,
Use BY-group processing and retain variable var_new:
data want;
set sample_data;
by var1-var3;
if first.var3 then var_new=var3name;
retain var_new;
run;
You haven't specified the value of var_new in the case that, e.g., the first and second row differ in var1, var2 or var3. My suggested code treats the first row in this case as a group comprised of only one observation, i.e., the value of var3name is copied to var_new, as always in the first observation of a group.
(Note that your WANT and SAMPLE_DATA are not exactly consistent.)
Edit:
If your real variable names do not have a common prefix followed by "1", "2", "3," write them out in the BY statement, e.g.,
by red green blue;
If your real data are not sorted by var1 var2 var3, but you still want to compare only consecutive observations, add the NOTSORTED option to the BY statement:
by var1-var3 notsorted;
@FreelanceReinh thanks for the quick reply. Yes, I will forgot to specify the case where row1 and row2 differs in var1, var2, and var3, could you suggest how I should go about such cases please. Thanks a lot
You can define the value of var_new in this case as appropriate for your application, e.g., use the value of var3name (as my suggested code does), leave it missing, set it to a constant value (e.g., "unique") or to a value indicating which of the three variables var1, var2 and var3 differ, etc. The code follows the specifications.
thanks, your code worked
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.