DATA Step, Macro, Functions and more

Compare values across multiple variables

Reply
Contributor
Posts: 38

Compare values across multiple variables

Below is my sample data. All variables are character variables. I want to compare v1-v4. For e.g. If v1 = v2, v3, v4, v5 or v1 ne v2, v3, v4, v5 Overall goal is to see if the values were same in all 5 variables or not Id V1 V2 V3 v4 ------------------------------------------------------------- 23 1 3 4 . 4 3 3 . . 5 5 . . . 78 7 5 3 . 246 9 9 . . 778 1 1 . . 36 3 2 . . 476 4 . . . 87 5 2 1 . 4779 8 8 . . 4587689 2 6 . . 3457 1 . . . 6879 4 4 . . Is there a simpler way to do so without a bunch of if-then statements? Thanks in advance. Aruna.

Frequent Contributor
Posts: 104

Re: Compare values across multiple variables

Simple idea:  test if the range is 0 or not.  If it is 0, then all non-missing values are the same.  If not, there is spread amongst the values.  Try this:

data same different;

     set have;

     if range(of v1-v5) = 0 then output same;

     else output different;

run;

Note that if there are missing values in the arguments, this WON'T detect its presence.  You'll need additional logic to handle.

Message was edited by: Daymond Ling

PROC Star
Posts: 7,486

Re: Compare values across multiple variables

You said that they were all character variable, but then showed numbers in your example.  Are they all numbers in a character variable? and, if so, are they all integers?

Super User
Posts: 11,343

Re: Compare values across multiple variables

A kludge which works if numeric or character:

rc = sum ( (v1 ne v2),(v1 ne v3),(v1 ne v4),(v1 ne v5));

rc will have the number of mismatches.

This uses the behavior of SAS to return 1 for a true logical comparison and 0 for a false. Care needs to be exercised for missing or blanks.

PROC Star
Posts: 7,486

Re: Compare values across multiple variables

Or, taking the kludge concept one step further, in the following sum will be equal to the number of matches between any of the pairs:

data have;

  informat v1-v5 $8.;

  input v1-v5;

  cards;

12 A B 14 12

C X 12 X 5

1 2 3 4 5

a b c d e

a a 3 b b

;

data want (drop=i j);

  set have;

  array vars $8. v1-v5;

  sum=0;

  do i=1 to dim(vars)-1;

    do j=i+1 to dim(vars);

      sum=sum (sum, vars(i) eq vars(j));

    end;

  end;

run;

Valued Guide
Posts: 765

Re: Compare values across multiple variables

Hi ... if you don't mind reading the data set twice, here's another idea that takes into account possible missing values ...

variable SAME is 1 if all values are the same, otherwise it's 0 ...

data want

set have;

call sortc (of vSmiley Happy;

same = ifn (coalescec(of vSmiley Happy eq v5, 1 , 0);

set have;

run;

PROC Star
Posts: 1,760

Compare values across multiple variables

Wow! Why doesn't coalescec always return v1 if v1 isn't missing in this case?

Super Contributor
Posts: 1,636

Compare values across multiple variables

Because "call sortc (of v: )" has sorted the variables.

PROC Star
Posts: 1,760

Re: Compare values across multiple variables

?? Linlin.

I understand now (is my brain getting slower?), coalescec does return v1 and if is is the same as v5, then they are all equal. Smart! (and obvious once you understand, as always for smart solutions).

And if the OP wants to take missing values into account or if there are none, this simply becomes

same = ifn (v1 eq v5, 1 , 0);

Thanks for this Mike, an innovative (for me) way to use sortc() across dataset variables.

Your solution is equivalent to

same = ifn (min(of vSmiley Happy eq max(of vSmiley Happy, 1 , 0);

except it works for strings and requires a prior sort.

Super User
Posts: 10,041

Re: Compare values across multiple variables

Do you like Hash Table?

data have;
  informat v1-v5 $8.;
  input v1-v5;
  cards;
12 A B 14 12
C X 12 X 5
1 2 3 4 5
a b c d e
a a a a a
;
run;
data want(drop=i k);
 set have;
 declare hash ha (hashexp:10);
  ha.definekey('k');
  ha.definedone();
 array _v{*} v:;
 do i=1 to dim(_v);
  k=_v{i};ha.replace();
 end;
 flag=ifc(ha.num_items=1,'All Same       ','Not All Same');
run;

Ksharp

Super User
Posts: 10,041

Re: Compare values across multiple variables

I found array is more simple.

data have;
  informat v1-v5 $8.;
  input v1-v5;
  cards;
12 A B 14 12
C X 12 X 5
1 1 1 1 1
a b c d e
a a a a a
;
run;
data want(drop=count i);
 set have;
 array _v{*} v:;
 count=0;
 do i=1 to dim(_v);
  count+(_v{1}=_v{i});
 end;
 flag=ifc(count=dim(_v),'All Same       ','Not All Same');
run;

Ksharp

Contributor
Posts: 38

Re: Compare values across multiple variables

Thanks a lot for all your responses. They all seem to work on my test dataset. I'll get back with the results from my original dataset. Thanks again.

Contributor
Posts: 38

Compare values across multiple variables

Good Morning, Once again thanks for all the wonderfull suggestions. I tried all the suggestions. The code that fit my data and the purpose is below. if range(of nrace1-nrace5) = 0 then Race1_5='Same      ';     else Race1_5= 'Different'; if range(of nrace1-nrace5) = . then Race1_5='Same      '; Other suggestons also worked. But the above code seem to be a better fit for my data maily due to missing values. Thanks.

PROC Star
Posts: 1,760

Re: Compare values across multiple variables

Note that you can write your existing logic as

RACE1_5 = ifc (range(of NRACE1-NRACE5) in(., 0), 'Same', 'Different' );

Valued Guide
Posts: 765

Re: Compare values across multiple variables

Hi ... but you did say "All variables are character variables" , yes/no?, and range is specific to numeric data.

If they are character variables but the values are numerals, it will work, otherwise no.

data x;

input (race1-race5) (: $1.);

datalines;

1 1 1 1 1

1 2 3 4 5

1 . 1 . 1

a b c d e

;

run;

data x;

set x;

same = ifc ( range(of rSmiley Happy  , 'different' , 'same');

run;

race1    race2    race3    race4    race5    same

  1        1        1        1        1      same

  1        2        3        4        5      different

  1                 1                 1      same

  a        b        c        d        e      same

Then there's the LOG messages ... the first NOTE is why it works with numerals.  The others are why it does not work with other values.

NOTE: Character values have been converted to numeric values at the places given by: (Line)Smiley SadColumn).

      2360:24

NOTE: Invalid numeric data, race1='a' , at line 2360 column 24.

NOTE: Invalid numeric data, race2='b' , at line 2360 column 24.

NOTE: Invalid numeric data, race3='c' , at line 2360 column 24.

NOTE: Invalid numeric data, race4='d' , at line 2360 column 24.

NOTE: Invalid numeric data, race5='e' , at line 2360 column 24.

race1=a race2=b race3=c race4=d race5=e same=same _ERROR_=1 _N_=4

NOTE: Missing values were generated as a result of performing an operation on missing values.

      Each place is given by: (Number of times) at (Line)Smiley SadColumn).

      1 at 2360:14

Ask a Question
Discussion stats
  • 14 replies
  • 15438 views
  • 1 like
  • 8 in conversation