Solved: Re: how to get number of distinct elements in each row without nested ...

SAS_Dude · Posted 10-14-2014 10:43 AM

I have a table which has multiple columns and large number of rows. A sub sample give below , in this case Name1-Name4 are different column names which coded with two character values.

Name1	Name2	Name3	Name4
AA	BB	CC	DD
AA	AA	BB	EE
FF	FF	CC	CC
AA	AA	AA	AA
BB	BB	BB	ZZ

I am interested in finding distinct in number of characters in each row . that is final output table has to look like this

Name1	Name2	Name3	Name4	Count
AA	BB	CC	DD	4
AA	AA	BB	EE	3
FF	FF	CC	CC	2
AA	AA	AA	AA	1
BB	BB	BB	ZZ	2

In the first row all the elements are distinct , so the count is 4. In second row there are 3 distinct values [AA,BB,EE]. Similarly for others

Can you please help to solve this issues without using multiple if-else statements.

Thanks,

Yoonus

Astounding · Posted 10-14-2014 01:37 PM

Art,

It's conceivable that blanks should be treated as a valid level in the counting. But if not, here's how I would modify the solution:

array names {*} name1-name4;

count=1;

decrement=0;

do _J_=1 to dim(names)-1;

increment=1;

if names{_J_}=' ' then decrement=1;

do _K_=_J_+1 to dim(names);

if names{_J_} = names{_K_} then increment=0;

end;

count + increment;

end;

if names{dim(names)}=' ' then decrement=1;

count = count - decrement;

View solution in original post

LinusH · Posted 10-14-2014 11:07 AM

Your data is not in third normal form. If it would, the task is simple.

If you lack of a natural key, set a row id, transpose your data, then use select count(distinct name) group by rowid.

Data never sleeps

stat_sas · Posted 10-14-2014 11:11 AM

data have;

id+1;

input (Name1 Name2 Name3 Name4)($);

datalines;

AA BB CC DD

AA AA BB EE

FF FF CC CC

AA AA AA AA

BB BB BB ZZ

;

proc transpose data=have out=have1;

by id;

var name1-name4;

run;

proc sql;

select a.name1,a.name2,a.name3,a.name4,b.distinct_count from have a

left join (select id,count(distinct col1) as distinct_count from have1 group by id) b

on a.id=b.id;

quit;

Astounding · Posted 10-14-2014 11:12 AM

Here's some code that could be stuck in a DATA step to help.

array names {*} name1-name4;

count=1;

do _J_=1 to dim(names)-1;

increment=1;

do _K_=_J_+1 to dim(names);

if names{_J_} = names{_K_} then increment=0;

end;

count + increment;

end;

You could hard-code 4 instead of using the DIM function. But it looked like the number of elements might actually be much larger in real life.

Good luck.

art297 · Posted 10-14-2014 11:21 AM

's suggestion won't work correctly if you have missing values. I think that the following would get around that:

data want (drop=_:);

set have;

array names(*) name1-name4;

array _names(4) $;

do _i=1 to 4;

_names(_i)=names(_i);

end;

call sortc(of _names(*));

count=ifn(missing(_names(dim(names))),0,1);

do _i=1 to dim(names)-1;

if not missing(_names(_i)) then do;

if _names(_i) ne _names(_i+1) then count+1;

end;

run;

Astounding · Posted 10-14-2014 01:37 PM

Art,

It's conceivable that blanks should be treated as a valid level in the counting. But if not, here's how I would modify the solution:

array names {*} name1-name4;

count=1;

decrement=0;

do _J_=1 to dim(names)-1;

increment=1;

if names{_J_}=' ' then decrement=1;

do _K_=_J_+1 to dim(names);

if names{_J_} = names{_K_} then increment=0;

end;

count + increment;

end;

if names{dim(names)}=' ' then decrement=1;

count = count - decrement;

KachiM · Posted 10-14-2014 12:08 PM

Here is one way:

data have;

input (Name1 Name2 Name3 Name4)($);

datalines;

AA BB CC DD

AA AA BB EE

FF FF CC CC

AA AA AA AA

BB BB BB ZZ

;

run;

data want;

set have;

array k

$ Name1 - Name4;

count = 1;

do i = 1 to dim(k) - 1;

if k ^= k[i + 1] then count + 1;

end;

drop i;

run;

The output:

Obs Name1 Name2 Name3 Name4 count

1 AA BB CC DD 4

2 AA AA BB EE 3

3 FF FF CC CC 2

4 AA AA AA AA 1

5 BB BB BB ZZ 2

CTorres · Posted 10-14-2014 03:03 PM

Hi Muthia,

Your solution does not work for the following case that give wrong result:

AA BB CC DD 4

AA AA BB EE 3

FF CC FF CC 4

AA AA AA AA 1

BB BB ZZ BB 3

CTorres

KachiM · Posted 10-15-2014 04:11 AM

I overlooked the ordering of Values.

Thanks to Gergely Bathó.

Using call sortc() will help in this case, but the values will be in ascending order in the output.

data want;

set have;

array k

$ Name1 - Name4;

call sortc(of k

);

count = 1;

do i = 1 to dim(k) - 1;

if k ^= k[i + 1] then count + 1;

end;

drop i;

run;

Ksharp · Posted 10-15-2014 05:04 AM

data have;
input (Name1 Name2 Name3 Name4)($);
datalines;
AA BB CC DD
AA AA BB EE
FF FF CC CC
AA AA AA AA
BB BB BB ZZ
;
run;
data have(drop=i);
set have;
array t{100} $32 _temporary_ ;
array x{*} $ name1-name4;
n=0;call missing(of t{*});
do i=1 to dim(x);
if x{i} not in t then do;n+1;t{n}=x{i};end;
end;
run;

Xia Keshan

agoldma · Posted 07-02-2017 11:26 PM

Ksharp, thank you for a very thoughtful and efficient solution.

I modified the array references a little bit -- to make it work in a PROC FCMP.

Since SAS doesn't have a built-in function like this, I wanted to make my own.

proc datasets nolist lib=work; delete functions; quit;

options cmplib=_null_;

proc fcmp outlib=work.functions.smd;

function count_dist_val(arr_val{*} $ ) varargs ;

array t{100} $100 _temporary_ ;

n = 0;

call missing( of t1-t100);

do i=1 to dim(arr_val);

if arr_val{i} not in (t1-t100) then

do;

n + 1 ;

t{n} = arr_val{i} ;

end;

return (n);

endsub;

run;

options cmplib=work.functions;

data have;

input (name1 name2 name3 name4)($);

array x{*} $ name1-name4;

dist_val = count_dist_val(x) ;

put dist_val= ;

datalines;

AA BB CC DD

AA AA BB EE

FF FF CC CC

AA AA AA AA

BB BB BB ZZ

;

run;

MikeZdeb · Posted 07-03-2017 04:43 PM

Hi, another (belated) idea though you end up with sorted names ...

data x;
input (name1-name4) (:$2.);
datalines;
AA BB CC DD
AA AA BB EE
FF FF CC CC
AA AA AA AA
BB BB BB ZZ
AA CC CC AA
JJ KK KK KK
ZZ YY ZZ YY
QQ QQ EE FF
;

data y (drop=names);
length names $30;
set x;
call sortc(of name1-name4);
names = catx(' ',of name: );
howmany = (count(tranwrd(names,name1,''),name2) ne 0) +
(count(tranwrd(names,name2,''),name3) ne 0) +
(count(tranwrd(names,name3,''),name4) ne 0) + 1;
run;

Or ...

data y (drop=j);
set x;
array name(4);
call sortc(of name1-name4);
howmany = 1;
do j=1 to 3;
howmany + ^^find(tranwrd(catx(' ',of name: ),name(j),''),name(j+1));
end;
run;

(^^find ... changes result from location to 1 if found, 0 if not ... thanks to Howard Schreier)

DATA SET: y

name1    name2    name3    name4    howmany

 AA       BB       CC       DD         4
 AA       AA       BB       EE         3
 CC       CC       FF       FF         2
 AA       AA       AA       AA         1
 BB       BB       BB       ZZ         2
 AA       AA       CC       CC         2
 JJ       KK       KK       KK         2
 YY       YY       ZZ       ZZ         2
 EE       FF       QQ       QQ         3

how to get number of distinct elements in each row without nested if-else loops

Re: how to get number of distinct elements in each row without nested if-else loops

Re: how to get number of distinct elements in each row without nested if-else loops

Re: how to get number of distinct elements in each row without nested if-else loops

Re: how to get number of distinct elements in each row without nested if-else loops

Re: how to get number of distinct elements in each row without nested if-else loops

Re: how to get number of distinct elements in each row without nested if-else loops

Re: how to get number of distinct elements in each row without nested if-else loops

Re: how to get number of distinct elements in each row without nested if-else loops

Re: how to get number of distinct elements in each row without nested if-else loops

Re: how to get number of distinct elements in each row without nested if-else loops

Re: how to get number of distinct elements in each row without nested if-else loops

Re: how to get number of distinct elements in each row without nested if-else loops

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away