Solved: Calculate distinct string and N of Distinct string at row level

Ram_SAS · Posted 11-06-2019 08:18 PM

Hi,

data have;
length name1 name2 name3 name4 $20;
name1='B'; name2= ''; name3= 'A'; name4='A';
output;
name1=' '; name2= 'B'; name3= ''; name4='';
output;
name1='A'; name2= 'A'; name3= 'A'; name4='';
output;
run;

Want output as

row DistinctName DistinctN
1 A, B 2
2 B 1
3 A 1

Thanks in advance!

ballardw · Posted 11-07-2019 12:03 PM

Not pretty or elegant but should be robust and scalable if you actually have more variables:

data have;
length name1 name2 name3 name4 $20;
input  name1 name2 name3 name4;
row+1;
datalines;
B .  A A
. B .  .
A A A .
;

proc transpose data=have out=trans (where=( not missing(col1)) drop=_name_);
by row;
var name1-name4;
run;
proc sql;
   create table temp as
   select distinct row,col1
   from trans;
quit;
data want;
   set temp;
   by row;
   /* need to set length of a target to string
      to at least longest possible so 20*number of variables+ number of variables
      minus 1 for commas.
   */
   length DistinctName $85; 
   retain DistinctName;
   if first.row then DistinctName=col1;
   else DistinctName= catx(',',DistinctName,col1);
   distinctn= countw(DistinctName,',');
   if last.row;
   drop col1;
run;

Time to learn how to write a data step. That list of variables and output statements gets cumbersome quickly.

Also use the code box on the forum opened with the {I} or "running man" icons for posting code.

And just for curiosity how will that comma delimited list of "names" actually be used?

View solution in original post

jarg · Posted 11-07-2019 12:20 AM

is this what you're after?

proc sql;
create table cat_count as
select cats(name1,name2,name3,name4) as ConcatenatedString
,count(*) as Volume
from have
group by ConcatenatedString
order by ConcatenatedString;
quit;

Ram_SAS · Posted 11-07-2019 09:36 AM

No. you are basically concatenating and counting. Instead, the concatenated string should have unique sorted string separated by comma (e.g. A,B or A) and the distinct N should count distinct strings (e.g. 2 or 1).

ballardw · Posted 11-07-2019 12:03 PM

Not pretty or elegant but should be robust and scalable if you actually have more variables:

data have;
length name1 name2 name3 name4 $20;
input  name1 name2 name3 name4;
row+1;
datalines;
B .  A A
. B .  .
A A A .
;

proc transpose data=have out=trans (where=( not missing(col1)) drop=_name_);
by row;
var name1-name4;
run;
proc sql;
   create table temp as
   select distinct row,col1
   from trans;
quit;
data want;
   set temp;
   by row;
   /* need to set length of a target to string
      to at least longest possible so 20*number of variables+ number of variables
      minus 1 for commas.
   */
   length DistinctName $85; 
   retain DistinctName;
   if first.row then DistinctName=col1;
   else DistinctName= catx(',',DistinctName,col1);
   distinctn= countw(DistinctName,',');
   if last.row;
   drop col1;
run;

Time to learn how to write a data step. That list of variables and output statements gets cumbersome quickly.

Also use the code box on the forum opened with the {I} or "running man" icons for posting code.

And just for curiosity how will that comma delimited list of "names" actually be used?

Calculate distinct string and N of Distinct string at row level

Re: Calculate distinct string and N of Distinct string at row level

Re: Calculate distinct string and N of Distinct string at row level

Re: Calculate distinct string and N of Distinct string at row level

Re: Calculate distinct string and N of Distinct string at row level

Calculate distinct string and N of Distinct string at row level

Re: Calculate distinct string and N of Distinct string at row level

Re: Calculate distinct string and N of Distinct string at row level

Re: Calculate distinct string and N of Distinct string at row level

Re: Calculate distinct string and N of Distinct string at row level

Registration is open

SAS Training: Just a Click Away