Hi,
I'm looking to compare the below 5 variables in a dataset and flag if they similar or not. The problem i am having is the blank values. I dont want to compare against blank values, so for row 2 & 5 I want the flag to be 'N'. So, if all the variables has same values i want the flag to be 'Y' if not 'N'
Sample data
Data test;
Infile cards;
Input ID $3. @5 var1 $1. @7 var2 $1. @9 var3 $1. @11 var4 $1. @13 var5 $1.;
cards;
123 A Z A A A
345 Z
178 A A A A A
908 A A Z
222 A
;
Run;
Expected Output
Flag
Y
N
N
Y
N
Any help is much appreciated!!
Thanks!
So, hi. I got a very poorly written code that does what you need...
but...
Data test;
Infile cards;
Input ID $3. @5 var1 $1. @7 var2 $1. @9 var3 $1. @11 var4 $1. @13 var5 $1.;
cards;
123 A Z A A A
345 Z
178 A A A A A
908 A A Z
222 A
;
Run;
data test2(drop = var:);
set test;
if var1 ne "" & var2 ne "" then f1 = compare(var1,var2);
if var3 ne "" & var2 ne "" then f2 = compare(var2,var3);
if var3 ne "" & var4 ne "" then f3 = compare(var3,var4);
if var4 ne "" & var5 ne "" then f4 = compare(var4,var5);
if f1 = f2 = f3 = f4 then flag= "N";else flag = "Y";
run;
I wanted to search for a function or something that does a pair-wise comparison of strings.
I keep looking.
Here is an updated (and hopefully truly doing pairwise comparison):
proc fcmp outlib = sasuser.funcs.trial;
function pairwise(x $,y $);
if x ne "" & y ne "" then
f = compare(x,y);
return (f);
endsub;
options cmplib=sasuser.funcs;
data test2;
set test;
f1 = pairwise(var1,var2);
f2 = pairwise(var2,var3);
f3 = pairwise(var3,var4);
f4 = pairwise(var4,var5);
f5 = pairwise(var1,var3);
f6 = pairwise(var1,var4);
f7 = pairwise(var1,var5);
f8 = pairwise(var2,var4);
f9 = pairwise(var2,var5);
f10 = pairwise(var3,var5);
if f1 = f2 = f3 = f4 = f5 = f6 = f7 = f8 = f9 = f10 then grp = "N";else grp = "Y";
run;
Best of luck,
Anca
What are the comparison rules?
What makes record 1 Y and record 3 N?
Since the 4th record has 2 blank values why is it Y?
Record 1 is Y because all the variables are not the same. var2 is different compared to var1,var3,var4 & var5 but in records 3 all the 5 variable are the same.. So if it is different compared to any one variable(except for blanks) I want to flag it as 'Y' if not 'N'.
Record 4 is N because var3 is different compared to var1 and var2. As var 4 and var 5 are blank I am not comparing with it.
In case of record 2 and 5 I want it as N because i don't want to compare if it is BLANK.
Hope I answered your question. Let me know if you have any additional question.
So, hi. I got a very poorly written code that does what you need...
but...
Data test;
Infile cards;
Input ID $3. @5 var1 $1. @7 var2 $1. @9 var3 $1. @11 var4 $1. @13 var5 $1.;
cards;
123 A Z A A A
345 Z
178 A A A A A
908 A A Z
222 A
;
Run;
data test2(drop = var:);
set test;
if var1 ne "" & var2 ne "" then f1 = compare(var1,var2);
if var3 ne "" & var2 ne "" then f2 = compare(var2,var3);
if var3 ne "" & var4 ne "" then f3 = compare(var3,var4);
if var4 ne "" & var5 ne "" then f4 = compare(var4,var5);
if f1 = f2 = f3 = f4 then flag= "N";else flag = "Y";
run;
I wanted to search for a function or something that does a pair-wise comparison of strings.
I keep looking.
Here is an updated (and hopefully truly doing pairwise comparison):
proc fcmp outlib = sasuser.funcs.trial;
function pairwise(x $,y $);
if x ne "" & y ne "" then
f = compare(x,y);
return (f);
endsub;
options cmplib=sasuser.funcs;
data test2;
set test;
f1 = pairwise(var1,var2);
f2 = pairwise(var2,var3);
f3 = pairwise(var3,var4);
f4 = pairwise(var4,var5);
f5 = pairwise(var1,var3);
f6 = pairwise(var1,var4);
f7 = pairwise(var1,var5);
f8 = pairwise(var2,var4);
f9 = pairwise(var2,var5);
f10 = pairwise(var3,var5);
if f1 = f2 = f3 = f4 = f5 = f6 = f7 = f8 = f9 = f10 then grp = "N";else grp = "Y";
run;
Best of luck,
Anca
Anca - It worked. I got the results i was looking for. Thank You!.
Can you please tell me what the trials is doing in the sasuser.funcs.trial? I tried without the trials and it works the same.
An easier way to approach it is to use compress(), temp var is kept for your convenience.
Data test;
Infile cards;
Input ID $3. @5 var1 $1. @7 var2 $1. @9 var3 $1. @11 var4 $1. @13 var5 $1.;
_cat=cats(of var:);
flag=ifc(missing(compress(_cat,first(_cat))),'N','Y');
cards;
123 A Z A A A
345 . . Z
178 A A A A A
908 A A Z
222 . . A
;
Run;
Haikuo
Haikuo - Your solution is very simple and quick. Thank you very much!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.