BookmarkSubscribeRSS Feed
rsva
Fluorite | Level 6

Below is my sample data. All variables are character variables. I want to compare v1-v4. For e.g. If v1 = v2, v3, v4, v5 or v1 ne v2, v3, v4, v5 Overall goal is to see if the values were same in all 5 variables or not Id V1 V2 V3 v4 ------------------------------------------------------------- 23 1 3 4 . 4 3 3 . . 5 5 . . . 78 7 5 3 . 246 9 9 . . 778 1 1 . . 36 3 2 . . 476 4 . . . 87 5 2 1 . 4779 8 8 . . 4587689 2 6 . . 3457 1 . . . 6879 4 4 . . Is there a simpler way to do so without a bunch of if-then statements? Thanks in advance. Aruna.

14 REPLIES 14
DLing
Obsidian | Level 7

Simple idea:  test if the range is 0 or not.  If it is 0, then all non-missing values are the same.  If not, there is spread amongst the values.  Try this:

data same different;

     set have;

     if range(of v1-v5) = 0 then output same;

     else output different;

run;

Note that if there are missing values in the arguments, this WON'T detect its presence.  You'll need additional logic to handle.

Message was edited by: Daymond Ling

art297
Opal | Level 21

You said that they were all character variable, but then showed numbers in your example.  Are they all numbers in a character variable? and, if so, are they all integers?

ballardw
Super User

A kludge which works if numeric or character:

rc = sum ( (v1 ne v2),(v1 ne v3),(v1 ne v4),(v1 ne v5));

rc will have the number of mismatches.

This uses the behavior of SAS to return 1 for a true logical comparison and 0 for a false. Care needs to be exercised for missing or blanks.

art297
Opal | Level 21

Or, taking the kludge concept one step further, in the following sum will be equal to the number of matches between any of the pairs:

data have;

  informat v1-v5 $8.;

  input v1-v5;

  cards;

12 A B 14 12

C X 12 X 5

1 2 3 4 5

a b c d e

a a 3 b b

;

data want (drop=i j);

  set have;

  array vars $8. v1-v5;

  sum=0;

  do i=1 to dim(vars)-1;

    do j=i+1 to dim(vars);

      sum=sum (sum, vars(i) eq vars(j));

    end;

  end;

run;

MikeZdeb
Rhodochrosite | Level 12

Hi ... if you don't mind reading the data set twice, here's another idea that takes into account possible missing values ...

variable SAME is 1 if all values are the same, otherwise it's 0 ...

data want

set have;

call sortc (of v:);

same = ifn (coalescec(of v:) eq v5, 1 , 0);

set have;

run;

ChrisNZ
Tourmaline | Level 20

Wow! Why doesn't coalescec always return v1 if v1 isn't missing in this case?

Linlin
Lapis Lazuli | Level 10

Because "call sortc (of v: )" has sorted the variables.

ChrisNZ
Tourmaline | Level 20

?? Linlin.

I understand now (is my brain getting slower?), coalescec does return v1 and if is is the same as v5, then they are all equal. Smart! (and obvious once you understand, as always for smart solutions).

And if the OP wants to take missing values into account or if there are none, this simply becomes

same = ifn (v1 eq v5, 1 , 0);

Thanks for this Mike, an innovative (for me) way to use sortc() across dataset variables.

Your solution is equivalent to

same = ifn (min(of v:) eq max(of v:), 1 , 0);

except it works for strings and requires a prior sort.

Ksharp
Super User

Do you like Hash Table?

data have;
  informat v1-v5 $8.;
  input v1-v5;
  cards;
12 A B 14 12
C X 12 X 5
1 2 3 4 5
a b c d e
a a a a a
;
run;
data want(drop=i k);
 set have;
 declare hash ha (hashexp:10);
  ha.definekey('k');
  ha.definedone();
 array _v{*} v:;
 do i=1 to dim(_v);
  k=_v{i};ha.replace();
 end;
 flag=ifc(ha.num_items=1,'All Same       ','Not All Same');
run;

Ksharp

Ksharp
Super User

I found array is more simple.

data have;
  informat v1-v5 $8.;
  input v1-v5;
  cards;
12 A B 14 12
C X 12 X 5
1 1 1 1 1
a b c d e
a a a a a
;
run;
data want(drop=count i);
 set have;
 array _v{*} v:;
 count=0;
 do i=1 to dim(_v);
  count+(_v{1}=_v{i});
 end;
 flag=ifc(count=dim(_v),'All Same       ','Not All Same');
run;

Ksharp

rsva
Fluorite | Level 6

Thanks a lot for all your responses. They all seem to work on my test dataset. I'll get back with the results from my original dataset. Thanks again.

rsva
Fluorite | Level 6

Good Morning, Once again thanks for all the wonderfull suggestions. I tried all the suggestions. The code that fit my data and the purpose is below. if range(of nrace1-nrace5) = 0 then Race1_5='Same      ';     else Race1_5= 'Different'; if range(of nrace1-nrace5) = . then Race1_5='Same      '; Other suggestons also worked. But the above code seem to be a better fit for my data maily due to missing values. Thanks.

ChrisNZ
Tourmaline | Level 20

Note that you can write your existing logic as

RACE1_5 = ifc (range(of NRACE1-NRACE5) in(., 0), 'Same', 'Different' );

MikeZdeb
Rhodochrosite | Level 12

Hi ... but you did say "All variables are character variables" , yes/no?, and range is specific to numeric data.

If they are character variables but the values are numerals, it will work, otherwise no.

data x;

input (race1-race5) (: $1.);

datalines;

1 1 1 1 1

1 2 3 4 5

1 . 1 . 1

a b c d e

;

run;

data x;

set x;

same = ifc ( range(of r:)  , 'different' , 'same');

run;

race1    race2    race3    race4    race5    same

  1        1        1        1        1      same

  1        2        3        4        5      different

  1                 1                 1      same

  a        b        c        d        e      same

Then there's the LOG messages ... the first NOTE is why it works with numerals.  The others are why it does not work with other values.

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).

      2360:24

NOTE: Invalid numeric data, race1='a' , at line 2360 column 24.

NOTE: Invalid numeric data, race2='b' , at line 2360 column 24.

NOTE: Invalid numeric data, race3='c' , at line 2360 column 24.

NOTE: Invalid numeric data, race4='d' , at line 2360 column 24.

NOTE: Invalid numeric data, race5='e' , at line 2360 column 24.

race1=a race2=b race3=c race4=d race5=e same=same _ERROR_=1 _N_=4

NOTE: Missing values were generated as a result of performing an operation on missing values.

      Each place is given by: (Number of times) at (Line):(Column).

      1 at 2360:14

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 34746 views
  • 2 likes
  • 8 in conversation