Hi,
I'm looking to compare the below 5 variables in a dataset and flag if they similar or not. The problem i am having is the blank values. I dont want to compare against blank values, so for row 2 & 5 I want the flag to be 'N'. So, if all the variables has same values i want the flag to be 'Y' if not 'N'
Sample data
Data test;
Infile cards;
Input ID $3. @5 var1 $1. @7 var2 $1. @9 var3 $1. @11 var4 $1. @13 var5 $1.;
cards;
123 A Z A A A
345 Z
178 A A A A A
908 A A Z
222 A
;
Run;
Expected Output
Flag
Y
N
N
Y
N
Any help is much appreciated!!
Thanks!
So, hi. I got a very poorly written code that does what you need...
but...
Data test;
Infile cards;
Input ID $3. @5 var1 $1. @7 var2 $1. @9 var3 $1. @11 var4 $1. @13 var5 $1.;
cards;
123 A Z A A A
345 Z
178 A A A A A
908 A A Z
222 A
;
Run;
data test2(drop = var:);
set test;
if var1 ne "" & var2 ne "" then f1 = compare(var1,var2);
if var3 ne "" & var2 ne "" then f2 = compare(var2,var3);
if var3 ne "" & var4 ne "" then f3 = compare(var3,var4);
if var4 ne "" & var5 ne "" then f4 = compare(var4,var5);
if f1 = f2 = f3 = f4 then flag= "N";else flag = "Y";
run;
I wanted to search for a function or something that does a pair-wise comparison of strings.
I keep looking.
Here is an updated (and hopefully truly doing pairwise comparison):
proc fcmp outlib = sasuser.funcs.trial;
function pairwise(x $,y $);
if x ne "" & y ne "" then
f = compare(x,y);
return (f);
endsub;
options cmplib=sasuser.funcs;
data test2;
set test;
f1 = pairwise(var1,var2);
f2 = pairwise(var2,var3);
f3 = pairwise(var3,var4);
f4 = pairwise(var4,var5);
f5 = pairwise(var1,var3);
f6 = pairwise(var1,var4);
f7 = pairwise(var1,var5);
f8 = pairwise(var2,var4);
f9 = pairwise(var2,var5);
f10 = pairwise(var3,var5);
if f1 = f2 = f3 = f4 = f5 = f6 = f7 = f8 = f9 = f10 then grp = "N";else grp = "Y";
run;
Best of luck,
Anca
What are the comparison rules?
What makes record 1 Y and record 3 N?
Since the 4th record has 2 blank values why is it Y?
Record 1 is Y because all the variables are not the same. var2 is different compared to var1,var3,var4 & var5 but in records 3 all the 5 variable are the same.. So if it is different compared to any one variable(except for blanks) I want to flag it as 'Y' if not 'N'.
Record 4 is N because var3 is different compared to var1 and var2. As var 4 and var 5 are blank I am not comparing with it.
In case of record 2 and 5 I want it as N because i don't want to compare if it is BLANK.
Hope I answered your question. Let me know if you have any additional question.
So, hi. I got a very poorly written code that does what you need...
but...
Data test;
Infile cards;
Input ID $3. @5 var1 $1. @7 var2 $1. @9 var3 $1. @11 var4 $1. @13 var5 $1.;
cards;
123 A Z A A A
345 Z
178 A A A A A
908 A A Z
222 A
;
Run;
data test2(drop = var:);
set test;
if var1 ne "" & var2 ne "" then f1 = compare(var1,var2);
if var3 ne "" & var2 ne "" then f2 = compare(var2,var3);
if var3 ne "" & var4 ne "" then f3 = compare(var3,var4);
if var4 ne "" & var5 ne "" then f4 = compare(var4,var5);
if f1 = f2 = f3 = f4 then flag= "N";else flag = "Y";
run;
I wanted to search for a function or something that does a pair-wise comparison of strings.
I keep looking.
Here is an updated (and hopefully truly doing pairwise comparison):
proc fcmp outlib = sasuser.funcs.trial;
function pairwise(x $,y $);
if x ne "" & y ne "" then
f = compare(x,y);
return (f);
endsub;
options cmplib=sasuser.funcs;
data test2;
set test;
f1 = pairwise(var1,var2);
f2 = pairwise(var2,var3);
f3 = pairwise(var3,var4);
f4 = pairwise(var4,var5);
f5 = pairwise(var1,var3);
f6 = pairwise(var1,var4);
f7 = pairwise(var1,var5);
f8 = pairwise(var2,var4);
f9 = pairwise(var2,var5);
f10 = pairwise(var3,var5);
if f1 = f2 = f3 = f4 = f5 = f6 = f7 = f8 = f9 = f10 then grp = "N";else grp = "Y";
run;
Best of luck,
Anca
Anca - It worked. I got the results i was looking for. Thank You!.
Can you please tell me what the trials is doing in the sasuser.funcs.trial? I tried without the trials and it works the same.
An easier way to approach it is to use compress(), temp var is kept for your convenience.
Data test;
Infile cards;
Input ID $3. @5 var1 $1. @7 var2 $1. @9 var3 $1. @11 var4 $1. @13 var5 $1.;
_cat=cats(of var:);
flag=ifc(missing(compress(_cat,first(_cat))),'N','Y');
cards;
123 A Z A A A
345 . . Z
178 A A A A A
908 A A Z
222 . . A
;
Run;
Haikuo
Haikuo - Your solution is very simple and quick. Thank you very much!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.