Solved: Re: Count distinct values in one column

Yves_Boonen · Posted 04-11-2016 05:49 AM

I have a table in SAS which contains 3 variables:

- dossier_handler (last name first name)

- r_creation_date (ddmmmyyyy)

- distributionoption (7 possible options containing spaces and words)

I would like to count how many of each distributionoption I can find in the table.

This is how the table could look like:

dossier_manager	r_creation_date	distributionoption
NameA	Date1	Option A
NameA	Date2	Option B
NameB	Date3	Option A
NameC	Date4	Option C
NameC	Date5	Option F
NameC	Date6	Option D
NameD	Date7	Option A
NameD	Date8	Option E
NameE	Date9	Option G
NameF	Date10	Option G

This is how my final result should look like:

distributionoption	count
Option A	755
Option B	58
Option C	142
Option D	228
Option E	71
Option F	6
Option G	565

I have tried count and distinct. I have tried creating a variable for each option and then using a where ("this option") this option +1. Neither of these options appear to be working.

Count and distinct code:

proc sort data=tables.focus;
by verdelingsoptie;
run;

proc sql;
create table tables.focus_ as
select count(distinct verdelingsoptie)
from tables.focus
;
quit;

Where (X=?) Y+1 code:

data tables.focus;
set tables.focus_;
by verdelingsoptie;

retain
  Fast_Track_1_erfgenaam
  Fast_Track_Leefgeld
  Fast_Track_Saldo
  Fast_Track_Sterkmaking
  Manuele_verdeelopdracht_kantoor
  Verdeling_notaris
  Verdeling_standaard
;

if first.verdelingsoptie
then do;
  Fast_Track_1_erfgenaam=0;
  Fast_Track_Leefgeld=0;
  Fast_Track_Saldo=0;
  Fast_Track_Sterkmaking=0;
  Manuele_verdeelopdracht_kantoor=0;
  Verdeling_notaris=0;
  Verdeling_standaard=0;
end;

select (verdelignsoptie);

when ('Fast Track 1 erfgenaam') Fast_Track_1_erfgenaam +1;
when ('Fast Track Leefgeld') Fast_Track_Leefgeld +1;
when ('Fast Track Saldo') Fast_Track_Saldo +1;
when ('Fast Track Sterkmaking') Fast_Track_Sterkmaking +1;
when ('Manuele verdeelopdracht kantoor') Manuele_verdeelopdracht_kantoor +1;
when ('Verdeling notaris') Verdeling_notaris +1;
when ('Verdeling standaard') Verdeling_standaard +1;

end;

if last.verdelingsoptie then output;
keep Fast_Track_1_erfgenaam Fast_Track_Leefgeld Fast_Track_Saldo Fast_Track_Sterkmaking Manuele_verdeelopdracht_kantoor Verdeling_notaris Verdeling_standaard;
run;

The count/distinct script results in a table that says there are 7 distributionoptions.

The where (X=?) Y+1 results in some errors:

NOTE: Character values have been converted to numeric values at the places given by:
      (Line):(Column).
      528:7   529:7   530:7   531:7   532:7   533:7   534:7
NOTE: Variable verdelignsoptie is uninitialized.
NOTE: Variable first.verdelingsoptie is uninitialized.
NOTE: Variable last.verdelingsoptie is uninitialized.
NOTE: Invalid numeric data, 'Fast Track 1 erfgenaam' , at line 528 column 7.
_TEMG002=455 Fast_Track_1_erfgenaam=1 Fast_Track_Leefgeld=. Fast_Track_Saldo=.
Fast_Track_Sterkmaking=. Manuele_verdeelopdracht_kantoor=. Verdeling_notaris=.
Verdeling_standaard=. first.verdelingsoptie=0 verdelignsoptie=. last.verdelingsoptie=0 _ERROR_=1
_N_=1
NOTE: There were 1 observations read from the data set TABLES.FOCUS_.
NOTE: The data set TABLES.FOCUS has 0 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.03 seconds

RW9 · Posted 04-11-2016 06:42 AM

Sorry, you haven't used my code. The key part of the SQL is the group by:

proc sql;
  create table TABLES.FOCUS_ as
  select  VERDELINGSOPTIE,        /* <= added, as we are grouping by */
          count(*) as COUNT
  from    TABLES.FOCUS
  group by VERDELINGSOPTIE;   /*  <--- this part here is important! */
quit;

View solution in original post

Yves_Boonen · Posted 04-11-2016 05:54 AM

In SQL I would do something like this:

select distributionoptions, count (*)

from tables.focus

group by distributionoptions

save as tables.focus_

RW9 · Posted 04-11-2016 06:28 AM

Hi,

For your first code:

proc sql;
  create table TABLES.FOCUS_ as
  select  count(*) as COUNT
  from    TABLES.FOCUS
  group by DISTRIBUTIONOPTION;
quit;

Should work fine. I note the code you post, doesn't match the names given in the description, so I used the description variable names. In your second code, you have:

by verdelingsoptie;

Which is not in that dataset, which is why you have problems. Could it be that you meant to use focus dataset rather than focus_? Although, in the description again, that variable is not mentioned - columns are:

dossier_manager

r_creation_date

distributionoption

Yves_Boonen · Posted 04-11-2016 06:38 AM

The table I start from is called "focus" in my "tables" library (tables.focus).

I want to perform the count on the variable "verdelingsoptie" (distributionoption in English).

After the count is complete, I want to store the new data in a table called "focus_" in my "tables" library (tables.focus_).

If I use your suggestion, which I tried also, this is the result:

proc sql;
create table tables.focus_ as
select count(*) as count
from tables.focus
order by verdelingsoptie
;
quit;

Result:

count
455
455
455
455
455
455
455
455
455
455

And that for 455 rows, because that's how many rows of data "tables.focus" contains.

RW9 · Posted 04-11-2016 06:42 AM

Sorry, you haven't used my code. The key part of the SQL is the group by:

proc sql;
  create table TABLES.FOCUS_ as
  select  VERDELINGSOPTIE,        /* <= added, as we are grouping by */
          count(*) as COUNT
  from    TABLES.FOCUS
  group by VERDELINGSOPTIE;   /*  <--- this part here is important! */
quit;

Yves_Boonen · Posted 04-11-2016 06:46 AM

Mondays cause me to be like "order by" = "group by". After changing that, it did work like you said. Thanks a dozen! 🙂

Kurt_Bremser · Posted 04-11-2016 06:49 AM

s/order by/group by/

😉

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Classroom Training Available!