BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
r4321
Pyrite | Level 9

Hello everyone, 

 

Im a sas beginner and I am trying to count the number of unique values across a range of columns (within the same row) 

 

For example, I have 130 columns that each contain either missing values or the names of a country and sometimes the same country name repeats. I want to create a separate variable that counts the number of unique values within the range. 

 

e.g.: 

 

   Col1      Col2       Col3         Col4      ..... Col131                  Count 

  U.S.        U.S.        .                                                                1  

  Korea     U.S.        China                                                        3

 

 

The example was for illustrative effect.

 

Thanks for your help!

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Sorry, you're right.  I was counting the wrong thing (getting the count for the most frequently occurring country).  Luckily, what you want is easier.  Here's a better version:

 

data want;

set have;

array col {131} col1-col131;

array new {131} $20  _temporary_;

do _n_=1 to 131;

   new{_n_} = col{_n_};

end;

call sortc(of new{*});

count = (new{1} > ' ');

do _n_=2 to 131;

   if new{_n_}  ne new{_n_-1} then count + 1;

end;

run;

View solution in original post

13 REPLIES 13
ballardw
Super User

There are likely a number of ways using arrays and lots of logic to search within the row but here is an approach that reshapes the data and uses other procedures to the desired count and the merges the result back to the data.

data have;
   infile datalines missover;
   informat Col1 - Col3 $10.;
   input Col1      Col2       Col3   ;
   row = _n_;
datalines;
  U.S.      U.S.               
  Korea     U.S.        China    
;
run;

proc transpose data=have out=trans;
   by row;
   var col: ;
run;

proc sql;
   create table levels as
   select row, count(col1) as count
   from (select distinct row, col1 from trans
         where not missing(col1))
   group by row;
quit;

Proc sql;
   create table want as 
   select have.*, levels.count
   from have left join levels on 
      have.row = levels.row
   ;
quit;
r4321
Pyrite | Level 9

Thanks for your quick reply Ballard...Im getting a little bit lost in the table references though... This is what I tried to come up with modeled after your code...but it's obviously not working very well cause Im having trouble following. 

 

proc transpose data=adtr.as7 out=adtr.as8;
by year_cusip;
var nation_: ;
run;

 

 

proc sql;
create table adtr.levels as
select year_cusip, count(col1) as count
from (select distinct year_cusip, col1 from trans
where not missing(col1))
group by year_cusip;
quit;

 

 

Proc sql;
create table adtr.test as
select adtr.levels, levels.count
from adtr.levels left join levels on
adtr.levels.year_cusip = levels.row
;
quit;

 

 

 

Thanks for your help

ballardw
Super User

@r4321 wrote:

Thanks for your quick reply Ballard...Im getting a little bit lost in the table references though... This is what I tried to come up with modeled after your code...but it's obviously not working very well cause Im having trouble following. 

 

 

 

Proc sql;
create table adtr.test as
select adtr.levels, levels.count
from adtr.levels left join levels on
adtr.levels.year_cusip = levels.row
;
quit;

 

 

 

Thanks for your help


Instead of adtr.levels use the name of the base data adtr.as7 (?) and

join on   levels.year_cuspid.

You should ahve gotten an error about the variable ROW not in the set Levels.

When discussing code please include log with any error messages.

JediApprentice
Pyrite | Level 9
data want;
  set have;
  array Country(*) col1-col131;
  CntryCount=0;
  do i=2 to 131;
    if Country(i) ne . and Country(i) ne Country(i-1) then CntryCount+1;
  end;
run;

Maybe start out with something like this (untested).

r4321
Pyrite | Level 9

Thank you... 

I tried this:

 

 

data adtr.as8;
set adtr.as7;
array Country(*) nation_1-nation_131;
CntryCount=0;
do i=2 to 131;
if Country(i) ne . and Country(i) ne Country(i-1) then CntryCount+1;
end;
run;

 

 

However,  I get a number of errors. 

sample: 

 

NOTE: Invalid numeric data, 'France' , at line 2142 column 8.
NOTE: Invalid numeric data, 'Switzerland' , at line 2142 column 8.
NOTE: Invalid numeric data, 'United States' , at line 2142 column 8

 

Thanks 

Astounding
PROC Star

Here's a variation on the array approach:

 

data want;

set have;

array col {131} col1-col131;

array new {131} $20  _temporary_;

do _n_=1 to 131;

   new{_n_} = col{_n_};

end;

call sortc(of new{*});

maxcount=0;

current_count=1;

next_country = new{1};

do _n_=2 to 131;

   if next_country = new{_n_} then do;

      current_count + 1;

      if _n_=131 and next_country > ' ' then maxcount = max(maxcount, current_count);

   end;

   else do; 

      if next_country > ' ' then maxcount = max(maxcount, current_count);

      next_country = new{_n_};

      current_count=1;

      if _n_=131 then maxcount = max(maxcount, 1);

   end;

end;

drop current_count next_country;

run;

 

It's untested code so there may be a little work needed here, but it might work as is.

r4321
Pyrite | Level 9

It seems as though the code runs.. But it doesn't quite return the desired result, the count is inaccurate. 


Screenshot (24).png
Astounding
PROC Star

Sorry, you're right.  I was counting the wrong thing (getting the count for the most frequently occurring country).  Luckily, what you want is easier.  Here's a better version:

 

data want;

set have;

array col {131} col1-col131;

array new {131} $20  _temporary_;

do _n_=1 to 131;

   new{_n_} = col{_n_};

end;

call sortc(of new{*});

count = (new{1} > ' ');

do _n_=2 to 131;

   if new{_n_}  ne new{_n_-1} then count + 1;

end;

run;

r4321
Pyrite | Level 9

Awesome ...This worked great! So Im getting something out of this, would you mind checking my understanding? 

 


data adtr.as8;
set adtr.as7;
array col {131} nation_1-nation_131;       ****creates an array for nation_1-nation_131
array new {131} $20 _temporary_;           ****creates an identical array for nation_1-nation_131 in order to do comparisons
do _n_=1 to 131;                                       ***tells it go through 131 iterations
new{_n_} = col{_n_};                                 ***comparing the cells
end;
call sortc(of new{*});                                    **sorts the log to facilitate comparisons
count = (new{1} > ' ');                                   **only count if not missing
do _n_=2 to 131;                                           **this array tells it to count the number of unique values in the array. 
if new{_n_} ne new{_n_-1} then count + 1;
end;
run;

Astounding
PROC Star

You have the right idea.  I just wanted to embellish a few  points.

 

do _n_=1 to 131 is for copying the original variables into the temporary array.

 

The reason for creating a temporary array holding a copy of the variables:  the values need to be sorted.  This would be possible, but the wrong thing to do:

 

call sortc(of col{*});

 

It would change the values of your existing variables, which is likely a bad outcome.

 

count= sets up an initial value for COUNT of 0 (if the first element after sorting is missing) or 1 (if the first element after sorting is nonmissing).

 

do _n_=2 to 131 goes through the remaining elements in the array to see if there is a change in the value.  On the next line, any change triggers the program to increment COUNT.

fama
Fluorite | Level 6

Hello

This code worked for me as well.Thank you

but the problem is that it counted the missing value as well. for instance, for the row like 4  5  4 . it showed 3.

how can i modify the code to solve this?

thanks a lot

Ksharp
Super User

It is very easy for IML. Do you like IML code ?

 

data have;
   infile datalines missover;
   informat Col1 - Col3 $10.;
   input Col1      Col2       Col3   ;
datalines;
  U.S.      U.S.               
  Korea     U.S.        China    
;
run;
data want;
 if _n_=1 then do;
  length k $ 80;
  declare hash h();
  h.definekey('k');
  h.definedone();
 end;
 set have;
 array x{*} $ col: ;
 do i=1 to dim(x);
  if not missing(x{i}) then do;
   k=x{i};h.replace();
  end;
 end;
 count=h.num_items;
 h.clear();
 drop i k;
run;
 UPDATED

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 8807 views
  • 3 likes
  • 6 in conversation