DATA Step, Macro, Functions and more

Count distinct values in one column

Accepted Solution Solved
Reply
Contributor
Posts: 62
Accepted Solution

Count distinct values in one column

[ Edited ]

I have a table in SAS which contains 3 variables:

- dossier_handler (last name first name)

- r_creation_date (ddmmmyyyy)

- distributionoption (7 possible options containing spaces and words)

 

I would like to count how many of each distributionoption I can find in the table.

 

This is how the table could look like:

 

dossier_managerr_creation_datedistributionoption
NameADate1Option A
NameADate2Option B
NameBDate3Option A
NameCDate4Option C
NameCDate5Option F
NameCDate6Option D
NameDDate7Option A
NameDDate8Option E
NameEDate9Option G
NameFDate10Option G

 

This is how my final result should look like:

 

distributionoptioncount
Option A755
Option B58
Option C142
Option D228
Option E71
Option F6
Option G565

 

I have tried count and distinct. I have tried creating a variable for each option and then using a where ("this option") this option +1. Neither of these options appear to be working.

 

Count and distinct code:

 

proc sort data=tables.focus;
by verdelingsoptie;
run;

proc sql;
create table tables.focus_ as
select count(distinct verdelingsoptie)
from tables.focus
;
quit;

 

Where (X=?) Y+1 code:

 

data tables.focus;
set tables.focus_;
by verdelingsoptie;

retain
  Fast_Track_1_erfgenaam
  Fast_Track_Leefgeld
  Fast_Track_Saldo
  Fast_Track_Sterkmaking
  Manuele_verdeelopdracht_kantoor
  Verdeling_notaris
  Verdeling_standaard
;

if first.verdelingsoptie
then do;
  Fast_Track_1_erfgenaam=0;
  Fast_Track_Leefgeld=0;
  Fast_Track_Saldo=0;
  Fast_Track_Sterkmaking=0;
  Manuele_verdeelopdracht_kantoor=0;
  Verdeling_notaris=0;
  Verdeling_standaard=0;
end;

select (verdelignsoptie);

when ('Fast Track 1 erfgenaam') Fast_Track_1_erfgenaam +1;
when ('Fast Track Leefgeld') Fast_Track_Leefgeld +1;
when ('Fast Track Saldo') Fast_Track_Saldo +1;
when ('Fast Track Sterkmaking') Fast_Track_Sterkmaking +1;
when ('Manuele verdeelopdracht kantoor') Manuele_verdeelopdracht_kantoor +1;
when ('Verdeling notaris') Verdeling_notaris +1;
when ('Verdeling standaard') Verdeling_standaard +1;

end;

if last.verdelingsoptie then output;
keep Fast_Track_1_erfgenaam Fast_Track_Leefgeld Fast_Track_Saldo Fast_Track_Sterkmaking Manuele_verdeelopdracht_kantoor Verdeling_notaris Verdeling_standaard;
run;

 

The count/distinct script results in a table that says there are 7 distributionoptions.

 

The where (X=?) Y+1 results in some errors:

 

NOTE: Character values have been converted to numeric values at the places given by:
      (Line):(Column).
      528:7   529:7   530:7   531:7   532:7   533:7   534:7
NOTE: Variable verdelignsoptie is uninitialized.
NOTE: Variable first.verdelingsoptie is uninitialized.
NOTE: Variable last.verdelingsoptie is uninitialized.
NOTE: Invalid numeric data, 'Fast Track 1 erfgenaam' , at line 528 column 7.
_TEMG002=455 Fast_Track_1_erfgenaam=1 Fast_Track_Leefgeld=. Fast_Track_Saldo=.
Fast_Track_Sterkmaking=. Manuele_verdeelopdracht_kantoor=. Verdeling_notaris=.
Verdeling_standaard=. first.verdelingsoptie=0 verdelignsoptie=. last.verdelingsoptie=0 _ERROR_=1
_N_=1
NOTE: There were 1 observations read from the data set TABLES.FOCUS_.
NOTE: The data set TABLES.FOCUS has 0 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.03 seconds

 


Accepted Solutions
Solution
‎04-11-2016 06:46 AM
Super User
Super User
Posts: 7,997

Re: Count distinct values in one column

Posted in reply to Yves_Boonen

Sorry, you haven't used my code.  The key part of the SQL is the group by:

proc sql;
  create table TABLES.FOCUS_ as
  select  VERDELINGSOPTIE,        /* <= added, as we are grouping by */
count(*) as COUNT from TABLES.FOCUS group by VERDELINGSOPTIE; /* <--- this part here is important! */ quit;

View solution in original post


All Replies
Contributor
Posts: 62

Re: Count distinct values in one column

Posted in reply to Yves_Boonen

In SQL I would do something like this:

 

select distributionoptions, count (*)

from tables.focus

group by distributionoptions

save as tables.focus_

 

Super User
Super User
Posts: 7,997

Re: Count distinct values in one column

Posted in reply to Yves_Boonen

Hi,

 

For your first code:

proc sql;
  create table TABLES.FOCUS_ as
  select  count(*) as COUNT
  from    TABLES.FOCUS
  group by DISTRIBUTIONOPTION;
quit;

Should work fine.  I note the code you post, doesn't match the names given in the description, so I used the description variable names.  In your second code, you have:

by verdelingsoptie;

Which is not in that dataset, which is why you have problems.  Could it be that you meant to use focus dataset rather than focus_?  Although, in the description again, that variable is not mentioned - columns are: 

dossier_manager r_creation_date distributionoption
Contributor
Posts: 62

Re: Count distinct values in one column

The table I start from is called "focus" in my "tables" library (tables.focus).

I want to perform the count on the variable "verdelingsoptie" (distributionoption in English).

After the count is complete, I want to store the new data in a table called "focus_" in my "tables" library (tables.focus_).

 

If I use your suggestion, which I tried also, this is the result:

 

proc sql;
create table tables.focus_ as
select count(*) as count
from tables.focus
order by verdelingsoptie
;
quit;

 

Result:

count
455
455
455
455
455
455
455
455
455
455

 

And that for 455 rows, because that's how many rows of data "tables.focus" contains.

Solution
‎04-11-2016 06:46 AM
Super User
Super User
Posts: 7,997

Re: Count distinct values in one column

Posted in reply to Yves_Boonen

Sorry, you haven't used my code.  The key part of the SQL is the group by:

proc sql;
  create table TABLES.FOCUS_ as
  select  VERDELINGSOPTIE,        /* <= added, as we are grouping by */
count(*) as COUNT from TABLES.FOCUS group by VERDELINGSOPTIE; /* <--- this part here is important! */ quit;
Contributor
Posts: 62

Re: Count distinct values in one column

Mondays cause me to be like "order by" = "group by". After changing that, it did work like you said. Thanks a dozen! Smiley Happy

Super User
Posts: 7,866

Re: Count distinct values in one column

Posted in reply to Yves_Boonen

s/order by/group by/

Smiley Wink

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 965 views
  • 2 likes
  • 3 in conversation