Obsidian | Level 7

## Calculate distinct string and N of Distinct string at row level

Hi,

data have;
length name1 name2 name3 name4 \$20;
name1='B'; name2= ''; name3= 'A'; name4='A';
output;
name1=' '; name2= 'B'; name3= ''; name4='';
output;
name1='A'; name2= 'A'; name3= 'A'; name4='';
output;
run;

Want output as

row DistinctName DistinctN
1    A, B                      2
2    B                          1
3    A                          1

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Calculate distinct string and N of Distinct string at row level

Not pretty or elegant but should be robust and scalable if you actually have more variables:

```data have;
length name1 name2 name3 name4 \$20;
input  name1 name2 name3 name4;
row+1;
datalines;
B .  A A
. B .  .
A A A .
;

proc transpose data=have out=trans (where=( not missing(col1)) drop=_name_);
by row;
var name1-name4;
run;
proc sql;
create table temp as
select distinct row,col1
from trans;
quit;
data want;
set temp;
by row;
/* need to set length of a target to string
to at least longest possible so 20*number of variables+ number of variables
minus 1 for commas.
*/
length DistinctName \$85;
retain DistinctName;
if first.row then DistinctName=col1;
else DistinctName= catx(',',DistinctName,col1);
distinctn= countw(DistinctName,',');
if last.row;
drop col1;
run;```

Time to learn how to write a data step. That list of variables and output statements gets cumbersome quickly.

Also use the code box on the forum opened with the {I} or "running man" icons for posting code.

And just for curiosity how will that comma delimited list of "names" actually be used?

3 REPLIES 3
Fluorite | Level 6

## Re: Calculate distinct string and N of Distinct string at row level

is this what you're after?

proc sql;
create table cat_count as
select cats(name1,name2,name3,name4) as ConcatenatedString
,count(*) as Volume
from have
group by ConcatenatedString
order by ConcatenatedString;
quit;

Obsidian | Level 7

## Re: Calculate distinct string and N of Distinct string at row level

No. you are basically concatenating and counting. Instead, the concatenated string should have unique sorted string separated by comma (e.g. A,B or A) and the distinct N should count distinct strings (e.g. 2 or 1).
Super User

## Re: Calculate distinct string and N of Distinct string at row level

Not pretty or elegant but should be robust and scalable if you actually have more variables:

```data have;
length name1 name2 name3 name4 \$20;
input  name1 name2 name3 name4;
row+1;
datalines;
B .  A A
. B .  .
A A A .
;

proc transpose data=have out=trans (where=( not missing(col1)) drop=_name_);
by row;
var name1-name4;
run;
proc sql;
create table temp as
select distinct row,col1
from trans;
quit;
data want;
set temp;
by row;
/* need to set length of a target to string
to at least longest possible so 20*number of variables+ number of variables
minus 1 for commas.
*/
length DistinctName \$85;
retain DistinctName;
if first.row then DistinctName=col1;
else DistinctName= catx(',',DistinctName,col1);
distinctn= countw(DistinctName,',');
if last.row;
drop col1;
run;```

Time to learn how to write a data step. That list of variables and output statements gets cumbersome quickly.

Also use the code box on the forum opened with the {I} or "running man" icons for posting code.

And just for curiosity how will that comma delimited list of "names" actually be used?

Discussion stats
• 3 replies
• 1982 views
• 0 likes
• 3 in conversation