Valued Guide
Posts: 653

# Detecting unique values across variables

I have a list of numeric variables (always integers) and would like to determine when they have distinct values. In the following code the three variables are I, J, K, but in reality there can be a large number of variables. Thoughts on an easier way to replace the [pre]i ne j & i ne k & j ne k[/pre]

[pre]data alldiff;
do i = 1 to 4;
do j = 1 to 4;
do k = 1 to 4;
if i ne j & i ne k & j ne k then output alldiff;
end;end;end;
run;
proc print data=alldiff;
run;[/pre]
Regular Contributor
Posts: 241

## Re: Detecting unique values across variables

Here is one way. In the worst case, you will make 0.5*n*(n-1) comparisons per obs.

/* test data */

data one;

do i = 1 to 4;

do j = 1 to 4;

do k = 1 to 4;

output;

end;

end;

end;

run;

/* flag if all ijk values are different */

data two;

set one;

array var(*) i j k;

unique = 1

drop p q;

do p = 1 to dim(var)-1 while(unique);

do q = p + 1 to dim(var) while(unique);

unique = var(p)^=var(q);

end;

end

run;

/* check */

proc print data=two(obs=10);

run;

/* on lst

Obs    i    j    k    unique

1    1    1    1       0

2    1    1    2       0

3    1    1    3       0

4    1    1    4       0

5    1    2    1       0

6    1    2    2       0

7    1    2    3       1

8    1    2    4       1

9    1    3    1       0

10    1    3    2       1

*/

Posts: 3,852

## Re: Detecting unique values across variables

I don't think this is really any different from Chang's program just "more statements" :-)

I don't unstand why I had to define a data item for the hash in order to use the ADD method syntax with the KEY argument, h.add(key:v,data:' ')
It works without DEFINEDATA and the DATA argument but you have assign Y and then call the ADD method, y=v; rc=h.add(), but h.add(key:v) does not.

[pre]
data alldiff;
length y 8 d \$1;
call missing(y,d);
declare hash h();
h.definekey('y');
h.definedata('d');
h.definedone();
do i = 1 to 4;
do j = 1 to 4;
do k = 1 to 4;
h.clear();
array v
• i j k;
do l=1 to dim(v) until(h.add(key:v,data:' ') gt 0);
end;
unique = l eq dim(v)+1;
output;
end;
end;
end;
drop y d l;
run;
[/pre]
• Valued Guide
Posts: 653

## Re: Detecting unique values across variables

Thank you Chang and Data _NULL_. I was afraid (and hoping) that I might have been missing something like a uniquevarvalues function. For either solution there is quite a bit of checking that needs to take place. It will be interesting to see how the hash solution scales for larger data sets and numbers of vars.
Thanks
Art
Regular Contributor
Posts: 171

## Re: Detecting unique values across variables

I was also curious to see how the hash solution compares for larger datasets with more variables. I ran a test on a dataset with 1000000 observations and 20 variables. Also, I tried testing a third solution which uses a simple macro to create a single statement in the data step. Here are the results.

[pre]
1 data test;
2 array var{20};
3 do i = 1 to 1000000;
4 do j=1 to dim(var);
5 var{j} = floor(ranuni(0)*300);
6 end;
7 output;
8 end;
9 drop i j;
10 run;

NOTE: The data set WORK.TEST has 1000000 observations and 20 variables.
NOTE: DATA statement used (Total process time):
real time 3.57 seconds
cpu time 3.57 seconds

11
12 data chang;
13 set test;
14 array var(*) var:;
NOTE: The array var has the same name as a SAS-supplied or user-defined function. Parentheses
following this name are treated as array references and not function references.
15 unique = 1;
16 drop p q;
17 do p = 1 to dim(var)-1 while(unique);
18 do q = p + 1 to dim(var) while(unique);
19 unique = var(p)^=var(q);
20 end;
21 end;
22 run;

NOTE: There were 1000000 observations read from the data set WORK.TEST.
NOTE: The data set WORK.CHANG has 1000000 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 24.56 seconds
cpu time 25.00 seconds

23
24 data data_null_;
25 length y 8 d \$1;
26 if _n_ = 1 then do;
27 call missing(y,d);
28 declare hash h();
29 h.definekey('y');
30 h.definedata('d');
31 h.definedone();
32 end;
33 set test;
34 h.clear();
35 array v
• var:;
36 do l=1 to dim(v) until(h.add(key:v,data:' ') gt 0);
37 end;
38 unique = l eq dim(v)+1;
39 output;
40 drop y d l;
41 run;

NOTE: There were 1000000 observations read from the data set WORK.TEST.
NOTE: The data set WORK.DATA_NULL_ has 1000000 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 36.87 seconds
cpu time 37.23 seconds

42
43 %macro ne(array, elements);
44 %do i = 1 %to %eval(&elements-2);
45 %do j = %eval(&i+1) %to &elements;
46 (&array[&i] ne &array[&j]) &
47 %end;
48 %end;
49 (&array[%eval(&elements-1)] ne &array[&elements])
50 %mend;
51
52 data _null_;
53 if 0 then set test;
54 array var
• var:;
55 call symputx('num_vars', dim(var));
56 run;

NOTE: DATA STEP stopped due to looping.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

57
58 data polingjw;
59 set test;
60 array var
• var:;
61 unique = (%ne(var, &num_vars));
62 run;

NOTE: There were 1000000 observations read from the data set WORK.TEST.
NOTE: The data set WORK.POLINGJW has 1000000 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 22.06 seconds
cpu time 22.15 seconds
[/pre]
Posts: 3,852

## Re: Detecting unique values across variables

See how you single statement performs if you use variable name references instead of array reference.
Regular Contributor
Posts: 171

## Re: Detecting unique values across variables

Hi data _null_,

I don’t get it. Don’t all three programs rely on array references? Please help me to understand.

Overall, it seems to me that Chang’s program is probably the best answer. His is the simplest program, it is comparable to the macro program that I posted in terms of efficiency (for my test dataset anyway), and I can see how it’s less problematic than using macros. I created the macro only because I was curious to see how using a single statement would compare to the other two excellent suggestions in terms of efficiency. Since I personally like Chang’s solution better than my own, I almost did not post my solution. But, then again, I suspected that if I did post the macro this thread could turn into a good learning experience for myself :-)

Thanks again.
Posts: 3,852

## Re: Detecting unique values across variables

My hypothesis is that

var1 eq var2
is faster than
var[1] eq var[2]

I will easy to test since your variable names are VARn, a bit harder to generalize.
Regular Contributor
Posts: 241

## Re: Detecting unique values across variables

@polingjw: Thanks for the nice words!

I think _null_ has a point. On my box, polingjw2 seems to run faster than polingjw1 below.

/* test data */
data test;
array var{20};
do i = 1 to 1000000;
do j=1 to dim(var);
var{j} = floor(ranuni(0)*300);
end;
output;
end;
drop i j;
run;

sasfile work.test.data open;

/* dummy step to load the data into sasfile.
see Mark^s sas-l posting
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind1102B&L=sas-l&D=1&H=0&O=D&T=1&P=10531 */
data _null_
set test;
run;

%macro ne(array, elements);
%do i = 1 %to %eval(&elements-2);
%do j = %eval(&i+1%to &elements;
(&array[&i] ne &array[&j]) &
%end;
%end;
(&array[%eval(&elements-1)] ne &array[&elements])
%mend;

data polingjw1;
set test;
array var
•  var:;

•      unique = (%ne(var, 20));
run;

%macro allDiff(root=, dim=);
%local i j maxi minj;
%let maxi = %eval(&dim - 2);
%do i = 1 %to &maxi;
%let minj = %eval(&i + 1);
%do j = &minj %to &dim;
%*; &root.&i ne &root&j &
%end;
%end;
&root.%eval(&dim - 1) ne &root.&dim
%mend  allDiff;

/* using no arrays but the variable names directly. it also
takes advantage of if expression short-circuiting as well.
see this and the thread mentioned in it:
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0406D&L=sas-l&D=1&H=0&O=D&T=1&P=45708 */
data polingjw2;
set test;
if %allDiff(root=var, dim=20then unique = 1;
else unique = 0;
run;

sasfile work.test.data close;

Regular Contributor
Posts: 171

## Re: Detecting unique values across variables

A quick question about your last posting... Wouldn’t using a LOAD option on the SASFILE statement eliminate the need for the dummy step?

[pre]
[/pre]
Regular Contributor
Posts: 241

## Re: Detecting unique values across variables

@polingjw: Yep. You are absolutely correct. I should have used load instead of open. Thanks.
Contributor
Posts: 29

## Re: Detecting unique values across variables

Wow! Nice to see so many ways to do this. Here you go My contribution.
Another Method using dictionary tables .

/*Creating test data */
data test;
array var{5};
do j=1 to 10;
do i=1 to dim(var);
var{i}= floor(ranuni(0)*60);
end;
output;
end;
drop i j;
run;

/* here it checks the uniqueness. Doing a not to get 1 for uniquenes */
data t;
set test;
unique=not(var1=var2=var3=var4=var5);
run;

/* Taking the numeric column name from the sashelp.vcolumn(dictionary) and creating a macro */
proc sql;
select name INTO :VARS SEPARATED BY '=' from sashelp.Vcolumn where libname='WORK'
and memname='TEST' and type='num';
quit;

%PUT &VARS.;
DATA T;
SET TEST;
UNIQUE=NOT(&VARS);
RUN;

I've not done bench mark against the other methods to see the performance though.
Regular Contributor
Posts: 241

## Re: Detecting unique values across variables

@jonam: A good try! Unfortunately, however, your code is incorrect (unless you have only two variables). The reason is simply that the negation of "all same" is not "all different."

not(var1=var2=var3=var4)

is a shorthand form of

not (var1=var2 & var2=var3 & var3=var4)

And this is equivalent to

(var1^=var2 | var2^=var3 | var3^=var4)

This evaluates true, if any of the adjacent pairs happens to be different, even when the others are the same.
Contributor
Posts: 29

## Re: Detecting unique values across variables

@chang.

Oh! you are absolutely right! Thanks a lot for pointing that out....I've taken a different approach.

/*Creating test data */

data test;

do i = 1 to 4;

do j = 1 to 4;

do k = 1 to 4;

output;

end;

end;

end;

run;

/* Taking the numeric column name from the sashelp.vcolumn(dictionary) and creating a macro */
proc sql;
select compress(name), count(name) INTO :VARS SEPARATED BY ' ',:max from sashelp.Vcolumn where libname='WORK'
and memname='TEST' and type='num';
quit;

%PUT &VARS.;
/*
Samevar: to avoid comparing the same variable
Next : Once identified equal Move to next observation
*/

option obs=max;
DATA T;
SET TEST;
array a{*} &vars.;
end=&max.;
flag=0;
do l=1 to end;
unique=1;
do l1=2 to end;
if l=l1 then go to Samevar;
if a{l}=a{l1} then
do;
unique=0;
flag=1;
end;
if flag=1 then go to next;
end;
Samevar:

end;
next:
drop l l1 end flag;
RUN; removed the old method.

Message was edited by: jonam
Super User
Posts: 10,784

## Re: Detecting unique values across variables

Dear Arthur.Carpenter:
I do not know whether you like hash table, I use it find another way.

[pre]
data temp;
do i = 1 to 4;
do j = 1 to 4;
do k = 1 to 4;
output;
end;
end;
end;
run;

data result(drop=rc count _n id);
set temp;
declare hash hh(hashexp: 10);
declare hiter ff('hh');
hh.definekey('id');
hh.definedone();
array var{*} i--k;
do _n=1 to dim(var);
id=var{_n};
hh.replace();
end;
count=0;
rc=ff.first();
do while(rc=0);
count+1;
rc=ff.next();
end;
if count=dim(var) then flag=1;
else flag=0;
run;
[/pre]

Ksharp
Discussion stats
• 14 replies
• 2409 views
• 0 likes
• 6 in conversation