I have a table in SAS which contains 3 variables:
- dossier_handler (last name first name)
- r_creation_date (ddmmmyyyy)
- distributionoption (7 possible options containing spaces and words)
I would like to count how many of each distributionoption I can find in the table.
This is how the table could look like:
dossier_manager | r_creation_date | distributionoption |
NameA | Date1 | Option A |
NameA | Date2 | Option B |
NameB | Date3 | Option A |
NameC | Date4 | Option C |
NameC | Date5 | Option F |
NameC | Date6 | Option D |
NameD | Date7 | Option A |
NameD | Date8 | Option E |
NameE | Date9 | Option G |
NameF | Date10 | Option G |
This is how my final result should look like:
distributionoption | count |
Option A | 755 |
Option B | 58 |
Option C | 142 |
Option D | 228 |
Option E | 71 |
Option F | 6 |
Option G | 565 |
I have tried count and distinct. I have tried creating a variable for each option and then using a where ("this option") this option +1. Neither of these options appear to be working.
Count and distinct code:
proc sort data=tables.focus;
by verdelingsoptie;
run;
proc sql;
create table tables.focus_ as
select count(distinct verdelingsoptie)
from tables.focus
;
quit;
Where (X=?) Y+1 code:
data tables.focus;
set tables.focus_;
by verdelingsoptie;
retain
Fast_Track_1_erfgenaam
Fast_Track_Leefgeld
Fast_Track_Saldo
Fast_Track_Sterkmaking
Manuele_verdeelopdracht_kantoor
Verdeling_notaris
Verdeling_standaard
;
if first.verdelingsoptie
then do;
Fast_Track_1_erfgenaam=0;
Fast_Track_Leefgeld=0;
Fast_Track_Saldo=0;
Fast_Track_Sterkmaking=0;
Manuele_verdeelopdracht_kantoor=0;
Verdeling_notaris=0;
Verdeling_standaard=0;
end;
select (verdelignsoptie);
when ('Fast Track 1 erfgenaam') Fast_Track_1_erfgenaam +1;
when ('Fast Track Leefgeld') Fast_Track_Leefgeld +1;
when ('Fast Track Saldo') Fast_Track_Saldo +1;
when ('Fast Track Sterkmaking') Fast_Track_Sterkmaking +1;
when ('Manuele verdeelopdracht kantoor') Manuele_verdeelopdracht_kantoor +1;
when ('Verdeling notaris') Verdeling_notaris +1;
when ('Verdeling standaard') Verdeling_standaard +1;
end;
if last.verdelingsoptie then output;
keep Fast_Track_1_erfgenaam Fast_Track_Leefgeld Fast_Track_Saldo Fast_Track_Sterkmaking Manuele_verdeelopdracht_kantoor Verdeling_notaris Verdeling_standaard;
run;
The count/distinct script results in a table that says there are 7 distributionoptions.
The where (X=?) Y+1 results in some errors:
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
528:7 529:7 530:7 531:7 532:7 533:7 534:7
NOTE: Variable verdelignsoptie is uninitialized.
NOTE: Variable first.verdelingsoptie is uninitialized.
NOTE: Variable last.verdelingsoptie is uninitialized.
NOTE: Invalid numeric data, 'Fast Track 1 erfgenaam' , at line 528 column 7.
_TEMG002=455 Fast_Track_1_erfgenaam=1 Fast_Track_Leefgeld=. Fast_Track_Saldo=.
Fast_Track_Sterkmaking=. Manuele_verdeelopdracht_kantoor=. Verdeling_notaris=.
Verdeling_standaard=. first.verdelingsoptie=0 verdelignsoptie=. last.verdelingsoptie=0 _ERROR_=1
_N_=1
NOTE: There were 1 observations read from the data set TABLES.FOCUS_.
NOTE: The data set TABLES.FOCUS has 0 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds
Sorry, you haven't used my code. The key part of the SQL is the group by:
proc sql; create table TABLES.FOCUS_ as select VERDELINGSOPTIE, /* <= added, as we are grouping by */
count(*) as COUNT from TABLES.FOCUS group by VERDELINGSOPTIE; /* <--- this part here is important! */ quit;
In SQL I would do something like this:
select distributionoptions, count (*)
from tables.focus
group by distributionoptions
save as tables.focus_
Hi,
For your first code:
proc sql; create table TABLES.FOCUS_ as select count(*) as COUNT from TABLES.FOCUS group by DISTRIBUTIONOPTION; quit;
Should work fine. I note the code you post, doesn't match the names given in the description, so I used the description variable names. In your second code, you have:
by verdelingsoptie;
Which is not in that dataset, which is why you have problems. Could it be that you meant to use focus dataset rather than focus_? Although, in the description again, that variable is not mentioned - columns are:
dossier_manager | r_creation_date | distributionoption |
The table I start from is called "focus" in my "tables" library (tables.focus).
I want to perform the count on the variable "verdelingsoptie" (distributionoption in English).
After the count is complete, I want to store the new data in a table called "focus_" in my "tables" library (tables.focus_).
If I use your suggestion, which I tried also, this is the result:
proc sql;
create table tables.focus_ as
select count(*) as count
from tables.focus
order by verdelingsoptie
;
quit;
Result:
count |
455 |
455 |
455 |
455 |
455 |
455 |
455 |
455 |
455 |
455 |
And that for 455 rows, because that's how many rows of data "tables.focus" contains.
Sorry, you haven't used my code. The key part of the SQL is the group by:
proc sql; create table TABLES.FOCUS_ as select VERDELINGSOPTIE, /* <= added, as we are grouping by */
count(*) as COUNT from TABLES.FOCUS group by VERDELINGSOPTIE; /* <--- this part here is important! */ quit;
Mondays cause me to be like "order by" = "group by". After changing that, it did work like you said. Thanks a dozen! 🙂
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.