SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

count unique variable values across columns (within same row)

Accepted Solution Solved
Reply
Contributor
Posts: 50
Accepted Solution

count unique variable values across columns (within same row)

Hello everyone, 

 

Im a sas beginner and I am trying to count the number of unique values across a range of columns (within the same row) 

 

For example, I have 130 columns that each contain either missing values or the names of a country and sometimes the same country name repeats. I want to create a separate variable that counts the number of unique values within the range. 

 

e.g.: 

 

   Col1      Col2       Col3         Col4      ..... Col131                  Count 

  U.S.        U.S.        .                                                                1  

  Korea     U.S.        China                                                        3

 

 

The example was for illustrative effect.

 

Thanks for your help!

 


Accepted Solutions
Solution
‎05-18-2016 06:52 PM
Super User
Posts: 5,083

Re: count unique variable values across columns (within same row)

Sorry, you're right.  I was counting the wrong thing (getting the count for the most frequently occurring country).  Luckily, what you want is easier.  Here's a better version:

 

data want;

set have;

array col {131} col1-col131;

array new {131} $20  _temporary_;

do _n_=1 to 131;

   new{_n_} = col{_n_};

end;

call sortc(of new{*});

count = (new{1} > ' ');

do _n_=2 to 131;

   if new{_n_}  ne new{_n_-1} then count + 1;

end;

run;

View solution in original post


All Replies
Super User
Posts: 10,500

Re: count unique variable values across columns (within same row)

There are likely a number of ways using arrays and lots of logic to search within the row but here is an approach that reshapes the data and uses other procedures to the desired count and the merges the result back to the data.

data have;
   infile datalines missover;
   informat Col1 - Col3 $10.;
   input Col1      Col2       Col3   ;
   row = _n_;
datalines;
  U.S.      U.S.               
  Korea     U.S.        China    
;
run;

proc transpose data=have out=trans;
   by row;
   var col: ;
run;

proc sql;
   create table levels as
   select row, count(col1) as count
   from (select distinct row, col1 from trans
         where not missing(col1))
   group by row;
quit;

Proc sql;
   create table want as 
   select have.*, levels.count
   from have left join levels on 
      have.row = levels.row
   ;
quit;
Contributor
Posts: 50

Re: count unique variable values across columns (within same row)

Thanks for your quick reply Ballard...Im getting a little bit lost in the table references though... This is what I tried to come up with modeled after your code...but it's obviously not working very well cause Im having trouble following. 

 

proc transpose data=adtr.as7 out=adtr.as8;
by year_cusip;
var nation_: ;
run;

 

 

proc sql;
create table adtr.levels as
select year_cusip, count(col1) as count
from (select distinct year_cusip, col1 from trans
where not missing(col1))
group by year_cusip;
quit;

 

 

Proc sql;
create table adtr.test as
select adtr.levels, levels.count
from adtr.levels left join levels on
adtr.levels.year_cusip = levels.row
;
quit;

 

 

 

Thanks for your help

Super User
Posts: 10,500

Re: count unique variable values across columns (within same row)


r4321 wrote:

Thanks for your quick reply Ballard...Im getting a little bit lost in the table references though... This is what I tried to come up with modeled after your code...but it's obviously not working very well cause Im having trouble following. 

 

 

 

Proc sql;
create table adtr.test as
select adtr.levels, levels.count
from adtr.levels left join levels on
adtr.levels.year_cusip = levels.row
;
quit;

 

 

 

Thanks for your help


Instead of adtr.levels use the name of the base data adtr.as7 (?) and

join on   levels.year_cuspid.

You should ahve gotten an error about the variable ROW not in the set Levels.

When discussing code please include log with any error messages.

Frequent Contributor
Posts: 123

Re: count unique variable values across columns (within same row)

data want;
  set have;
  array Country(*) col1-col131;
  CntryCount=0;
  do i=2 to 131;
    if Country(i) ne . and Country(i) ne Country(i-1) then CntryCount+1;
  end;
run;

Maybe start out with something like this (untested).

Contributor
Posts: 50

Re: count unique variable values across columns (within same row)

Thank you... 

I tried this:

 

 

data adtr.as8;
set adtr.as7;
array Country(*) nation_1-nation_131;
CntryCount=0;
do i=2 to 131;
if Country(i) ne . and Country(i) ne Country(i-1) then CntryCount+1;
end;
run;

 

 

However,  I get a number of errors. 

sample: 

 

NOTE: Invalid numeric data, 'France' , at line 2142 column 8.
NOTE: Invalid numeric data, 'Switzerland' , at line 2142 column 8.
NOTE: Invalid numeric data, 'United States' , at line 2142 column 8

 

Thanks 

Super User
Posts: 5,083

Re: count unique variable values across columns (within same row)

Here's a variation on the array approach:

 

data want;

set have;

array col {131} col1-col131;

array new {131} $20  _temporary_;

do _n_=1 to 131;

   new{_n_} = col{_n_};

end;

call sortc(of new{*});

maxcount=0;

current_count=1;

next_country = new{1};

do _n_=2 to 131;

   if next_country = new{_n_} then do;

      current_count + 1;

      if _n_=131 and next_country > ' ' then maxcount = max(maxcount, current_count);

   end;

   else do; 

      if next_country > ' ' then maxcount = max(maxcount, current_count);

      next_country = new{_n_};

      current_count=1;

      if _n_=131 then maxcount = max(maxcount, 1);

   end;

end;

drop current_count next_country;

run;

 

It's untested code so there may be a little work needed here, but it might work as is.

Contributor
Posts: 50

Re: count unique variable values across columns (within same row)

It seems as though the code runs.. But it doesn't quite return the desired result, the count is inaccurate. 


Screenshot (24).png
Contributor
Posts: 50

Re: count unique variable values across columns (within same row)


Screenshot (23).png
Solution
‎05-18-2016 06:52 PM
Super User
Posts: 5,083

Re: count unique variable values across columns (within same row)

Sorry, you're right.  I was counting the wrong thing (getting the count for the most frequently occurring country).  Luckily, what you want is easier.  Here's a better version:

 

data want;

set have;

array col {131} col1-col131;

array new {131} $20  _temporary_;

do _n_=1 to 131;

   new{_n_} = col{_n_};

end;

call sortc(of new{*});

count = (new{1} > ' ');

do _n_=2 to 131;

   if new{_n_}  ne new{_n_-1} then count + 1;

end;

run;

Contributor
Posts: 50

Re: count unique variable values across columns (within same row)

Awesome ...This worked great! So Im getting something out of this, would you mind checking my understanding? 

 


data adtr.as8;
set adtr.as7;
array col {131} nation_1-nation_131;       ****creates an array for nation_1-nation_131
array new {131} $20 _temporary_;           ****creates an identical array for nation_1-nation_131 in order to do comparisons
do _n_=1 to 131;                                       ***tells it go through 131 iterations
new{_n_} = col{_n_};                                 ***comparing the cells
end;
call sortc(of new{*});                                    **sorts the log to facilitate comparisons
count = (new{1} > ' ');                                   **only count if not missing
do _n_=2 to 131;                                           **this array tells it to count the number of unique values in the array. 
if new{_n_} ne new{_n_-1} then count + 1;
end;
run;

Super User
Posts: 5,083

Re: count unique variable values across columns (within same row)

You have the right idea.  I just wanted to embellish a few  points.

 

do _n_=1 to 131 is for copying the original variables into the temporary array.

 

The reason for creating a temporary array holding a copy of the variables:  the values need to be sorted.  This would be possible, but the wrong thing to do:

 

call sortc(of col{*});

 

It would change the values of your existing variables, which is likely a bad outcome.

 

count= sets up an initial value for COUNT of 0 (if the first element after sorting is missing) or 1 (if the first element after sorting is nonmissing).

 

do _n_=2 to 131 goes through the remaining elements in the array to see if there is a change in the value.  On the next line, any change triggers the program to increment COUNT.

New Contributor
Posts: 4

Re: count unique variable values across columns (within same row)

Hello

This code worked for me as well.Thank you

but the problem is that it counted the missing value as well. for instance, for the row like 4  5  4 . it showed 3.

how can i modify the code to solve this?

thanks a lot

Super User
Posts: 9,681

Re: count unique variable values across columns (within same row)

[ Edited ]

It is very easy for IML. Do you like IML code ?

 

data have;
   infile datalines missover;
   informat Col1 - Col3 $10.;
   input Col1      Col2       Col3   ;
datalines;
  U.S.      U.S.               
  Korea     U.S.        China    
;
run;
data want;
 if _n_=1 then do;
  length k $ 80;
  declare hash h();
  h.definekey('k');
  h.definedone();
 end;
 set have;
 array x{*} $ col: ;
 do i=1 to dim(x);
  if not missing(x{i}) then do;
   k=x{i};h.replace();
  end;
 end;
 count=h.num_items;
 h.clear();
 drop i k;
run;
 UPDATED
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 886 views
  • 3 likes
  • 6 in conversation