I have a table which has multiple columns and large number of rows. A sub sample give below , in this case Name1-Name4 are different column names which coded with two character values.
Name1 | Name2 | Name3 | Name4 |
AA | BB | CC | DD |
AA | AA | BB | EE |
FF | FF | CC | CC |
AA | AA | AA | AA |
BB | BB | BB | ZZ |
I am interested in finding distinct in number of characters in each row . that is final output table has to look like this
Name1 | Name2 | Name3 | Name4 | Count |
AA | BB | CC | DD | 4 |
AA | AA | BB | EE | 3 |
FF | FF | CC | CC | 2 |
AA | AA | AA | AA | 1 |
BB | BB | BB | ZZ | 2 |
In the first row all the elements are distinct , so the count is 4. In second row there are 3 distinct values [AA,BB,EE]. Similarly for others
Can you please help to solve this issues without using multiple if-else statements.
Thanks,
Yoonus
Art,
It's conceivable that blanks should be treated as a valid level in the counting. But if not, here's how I would modify the solution:
array names {*} name1-name4;
count=1;
decrement=0;
do _J_=1 to dim(names)-1;
increment=1;
if names{_J_}=' ' then decrement=1;
do _K_=_J_+1 to dim(names);
if names{_J_} = names{_K_} then increment=0;
end;
count + increment;
end;
if names{dim(names)}=' ' then decrement=1;
count = count - decrement;
Your data is not in third normal form. If it would, the task is simple.
If you lack of a natural key, set a row id, transpose your data, then use select count(distinct name) group by rowid.
data have;
id+1;
input (Name1 Name2 Name3 Name4)($);
datalines;
AA BB CC DD
AA AA BB EE
FF FF CC CC
AA AA AA AA
BB BB BB ZZ
;
proc transpose data=have out=have1;
by id;
var name1-name4;
run;
proc sql;
select a.name1,a.name2,a.name3,a.name4,b.distinct_count from have a
left join (select id,count(distinct col1) as distinct_count from have1 group by id) b
on a.id=b.id;
quit;
Here's some code that could be stuck in a DATA step to help.
array names {*} name1-name4;
count=1;
do _J_=1 to dim(names)-1;
increment=1;
do _K_=_J_+1 to dim(names);
if names{_J_} = names{_K_} then increment=0;
end;
count + increment;
end;
You could hard-code 4 instead of using the DIM function. But it looked like the number of elements might actually be much larger in real life.
Good luck.
's suggestion won't work correctly if you have missing values. I think that the following would get around that:
data want (drop=_:);
set have;
array names(*) name1-name4;
array _names(4) $;
do _i=1 to 4;
_names(_i)=names(_i);
end;
call sortc(of _names(*));
count=ifn(missing(_names(dim(names))),0,1);
do _i=1 to dim(names)-1;
if not missing(_names(_i)) then do;
if _names(_i) ne _names(_i+1) then count+1;
end;
end;
run;
Art,
It's conceivable that blanks should be treated as a valid level in the counting. But if not, here's how I would modify the solution:
array names {*} name1-name4;
count=1;
decrement=0;
do _J_=1 to dim(names)-1;
increment=1;
if names{_J_}=' ' then decrement=1;
do _K_=_J_+1 to dim(names);
if names{_J_} = names{_K_} then increment=0;
end;
count + increment;
end;
if names{dim(names)}=' ' then decrement=1;
count = count - decrement;
Here is one way:
data have;
input (Name1 Name2 Name3 Name4)($);
datalines;
AA BB CC DD
AA AA BB EE
FF FF CC CC
AA AA AA AA
BB BB BB ZZ
;
run;
data want;
set have;
array k
count = 1;
do i = 1 to dim(k) - 1;
if k ^= k[i + 1] then count + 1;
end;
drop i;
run;
The output:
Obs Name1 Name2 Name3 Name4 count
1 AA BB CC DD 4
2 AA AA BB EE 3
3 FF FF CC CC 2
4 AA AA AA AA 1
5 BB BB BB ZZ 2
Hi Muthia,
Your solution does not work for the following case that give wrong result:
AA BB CC DD 4
AA AA BB EE 3
FF CC FF CC 4
AA AA AA AA 1
BB BB ZZ BB 3
CTorres
I overlooked the ordering of Values.
Thanks to Gergely Bathó.
Using call sortc() will help in this case, but the values will be in ascending order in the output.
data want;
set have;
array k
call sortc(of k
count = 1;
do i = 1 to dim(k) - 1;
if k ^= k[i + 1] then count + 1;
end;
drop i;
run;
data have; input (Name1 Name2 Name3 Name4)($); datalines; AA BB CC DD AA AA BB EE FF FF CC CC AA AA AA AA BB BB BB ZZ ; run; data have(drop=i); set have; array t{100} $32 _temporary_ ; array x{*} $ name1-name4; n=0;call missing(of t{*}); do i=1 to dim(x); if x{i} not in t then do;n+1;t{n}=x{i};end; end; run;
Xia Keshan
Ksharp, thank you for a very thoughtful and efficient solution.
I modified the array references a little bit -- to make it work in a PROC FCMP.
Since SAS doesn't have a built-in function like this, I wanted to make my own.
proc datasets nolist lib=work; delete functions; quit;
options cmplib=_null_;
proc fcmp outlib=work.functions.smd;
function count_dist_val(arr_val{*} $ ) varargs ;
array t{100} $100 _temporary_ ;
n = 0;
call missing( of t1-t100);
do i=1 to dim(arr_val);
if arr_val{i} not in (t1-t100) then
do;
n + 1 ;
t{n} = arr_val{i} ;
end;
end;
return (n);
endsub;
run;
options cmplib=work.functions;
data have;
input (name1 name2 name3 name4)($);
array x{*} $ name1-name4;
dist_val = count_dist_val(x) ;
put dist_val= ;
datalines;
AA BB CC DD
AA AA BB EE
FF FF CC CC
AA AA AA AA
BB BB BB ZZ
;
run;
Hi, another (belated) idea though you end up with sorted names ...
data x;
input (name1-name4) (:$2.);
datalines;
AA BB CC DD
AA AA BB EE
FF FF CC CC
AA AA AA AA
BB BB BB ZZ
AA CC CC AA
JJ KK KK KK
ZZ YY ZZ YY
QQ QQ EE FF
;
data y (drop=names);
length names $30;
set x;
call sortc(of name1-name4);
names = catx(' ',of name: );
howmany = (count(tranwrd(names,name1,''),name2) ne 0) +
(count(tranwrd(names,name2,''),name3) ne 0) +
(count(tranwrd(names,name3,''),name4) ne 0) + 1;
run;
Or ...
data y (drop=j);
set x;
array name(4);
call sortc(of name1-name4);
howmany = 1;
do j=1 to 3;
howmany + ^^find(tranwrd(catx(' ',of name: ),name(j),''),name(j+1));
end;
run;
(^^find ... changes result from location to 1 if found, 0 if not ... thanks to Howard Schreier)
DATA SET: y
name1 name2 name3 name4 howmany
AA BB CC DD 4
AA AA BB EE 3
CC CC FF FF 2
AA AA AA AA 1
BB BB BB ZZ 2
AA AA CC CC 2
JJ KK KK KK 2
YY YY ZZ ZZ 2
EE FF QQ QQ 3
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.