Below is my sample data. All variables are character variables. I want to compare v1-v4. For e.g. If v1 = v2, v3, v4, v5 or v1 ne v2, v3, v4, v5 Overall goal is to see if the values were same in all 5 variables or not Id V1 V2 V3 v4 ------------------------------------------------------------- 23 1 3 4 . 4 3 3 . . 5 5 . . . 78 7 5 3 . 246 9 9 . . 778 1 1 . . 36 3 2 . . 476 4 . . . 87 5 2 1 . 4779 8 8 . . 4587689 2 6 . . 3457 1 . . . 6879 4 4 . . Is there a simpler way to do so without a bunch of if-then statements? Thanks in advance. Aruna.
Simple idea: test if the range is 0 or not. If it is 0, then all non-missing values are the same. If not, there is spread amongst the values. Try this:
data same different;
set have;
if range(of v1-v5) = 0 then output same;
else output different;
run;
Note that if there are missing values in the arguments, this WON'T detect its presence. You'll need additional logic to handle.
Message was edited by: Daymond Ling
You said that they were all character variable, but then showed numbers in your example. Are they all numbers in a character variable? and, if so, are they all integers?
A kludge which works if numeric or character:
rc = sum ( (v1 ne v2),(v1 ne v3),(v1 ne v4),(v1 ne v5));
rc will have the number of mismatches.
This uses the behavior of SAS to return 1 for a true logical comparison and 0 for a false. Care needs to be exercised for missing or blanks.
Or, taking the kludge concept one step further, in the following sum will be equal to the number of matches between any of the pairs:
data have;
informat v1-v5 $8.;
input v1-v5;
cards;
12 A B 14 12
C X 12 X 5
1 2 3 4 5
a b c d e
a a 3 b b
;
data want (drop=i j);
set have;
array vars $8. v1-v5;
sum=0;
do i=1 to dim(vars)-1;
do j=i+1 to dim(vars);
sum=sum (sum, vars(i) eq vars(j));
end;
end;
run;
Hi ... if you don't mind reading the data set twice, here's another idea that takes into account possible missing values ...
variable SAME is 1 if all values are the same, otherwise it's 0 ...
data want
set have;
call sortc (of v:);
same = ifn (coalescec(of v:) eq v5, 1 , 0);
set have;
run;
Wow! Why doesn't coalescec always return v1 if v1 isn't missing in this case?
Because "call sortc (of v: )" has sorted the variables.
?? Linlin.
I understand now (is my brain getting slower?), coalescec does return v1 and if is is the same as v5, then they are all equal. Smart! (and obvious once you understand, as always for smart solutions).
And if the OP wants to take missing values into account or if there are none, this simply becomes
same = ifn (v1 eq v5, 1 , 0);
Thanks for this Mike, an innovative (for me) way to use sortc() across dataset variables.
Your solution is equivalent to
same = ifn (min(of v:) eq max(of v:), 1 , 0);
except it works for strings and requires a prior sort.
Do you like Hash Table?
data have; informat v1-v5 $8.; input v1-v5; cards; 12 A B 14 12 C X 12 X 5 1 2 3 4 5 a b c d e a a a a a ; run; data want(drop=i k); set have; declare hash ha (hashexp:10); ha.definekey('k'); ha.definedone(); array _v{*} v:; do i=1 to dim(_v); k=_v{i};ha.replace(); end; flag=ifc(ha.num_items=1,'All Same ','Not All Same'); run;
Ksharp
I found array is more simple.
data have; informat v1-v5 $8.; input v1-v5; cards; 12 A B 14 12 C X 12 X 5 1 1 1 1 1 a b c d e a a a a a ; run; data want(drop=count i); set have; array _v{*} v:; count=0; do i=1 to dim(_v); count+(_v{1}=_v{i}); end; flag=ifc(count=dim(_v),'All Same ','Not All Same'); run;
Ksharp
Thanks a lot for all your responses. They all seem to work on my test dataset. I'll get back with the results from my original dataset. Thanks again.
Good Morning, Once again thanks for all the wonderfull suggestions. I tried all the suggestions. The code that fit my data and the purpose is below. if range(of nrace1-nrace5) = 0 then Race1_5='Same '; else Race1_5= 'Different'; if range(of nrace1-nrace5) = . then Race1_5='Same '; Other suggestons also worked. But the above code seem to be a better fit for my data maily due to missing values. Thanks.
Note that you can write your existing logic as
RACE1_5 = ifc (range(of NRACE1-NRACE5) in(., 0), 'Same', 'Different' );
Hi ... but you did say "All variables are character variables" , yes/no?, and range is specific to numeric data.
If they are character variables but the values are numerals, it will work, otherwise no.
data x;
input (race1-race5) (: $1.);
datalines;
1 1 1 1 1
1 2 3 4 5
1 . 1 . 1
a b c d e
;
run;
data x;
set x;
same = ifc ( range(of r:) , 'different' , 'same');
run;
race1 race2 race3 race4 race5 same
1 1 1 1 1 same
1 2 3 4 5 different
1 1 1 same
a b c d e same
Then there's the LOG messages ... the first NOTE is why it works with numerals. The others are why it does not work with other values.
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
2360:24
NOTE: Invalid numeric data, race1='a' , at line 2360 column 24.
NOTE: Invalid numeric data, race2='b' , at line 2360 column 24.
NOTE: Invalid numeric data, race3='c' , at line 2360 column 24.
NOTE: Invalid numeric data, race4='d' , at line 2360 column 24.
NOTE: Invalid numeric data, race5='e' , at line 2360 column 24.
race1=a race2=b race3=c race4=d race5=e same=same _ERROR_=1 _N_=4
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 2360:14
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.