I have a table in SAS which contains 3 variables:
- dossier_handler (last name first name)
- r_creation_date (ddmmmyyyy)
- distributionoption (7 possible options containing spaces and words)
I would like to count how many of each distributionoption I can find in the table.
This is how the table could look like:
dossier_manager | r_creation_date | distributionoption |
NameA | Date1 | Option A |
NameA | Date2 | Option B |
NameB | Date3 | Option A |
NameC | Date4 | Option C |
NameC | Date5 | Option F |
NameC | Date6 | Option D |
NameD | Date7 | Option A |
NameD | Date8 | Option E |
NameE | Date9 | Option G |
NameF | Date10 | Option G |
This is how my final result should look like:
distributionoption | count |
Option A | 755 |
Option B | 58 |
Option C | 142 |
Option D | 228 |
Option E | 71 |
Option F | 6 |
Option G | 565 |
I have tried count and distinct. I have tried creating a variable for each option and then using a where ("this option") this option +1. Neither of these options appear to be working.
Count and distinct code:
proc sort data=tables.focus;
by verdelingsoptie;
run;
proc sql;
create table tables.focus_ as
select count(distinct verdelingsoptie)
from tables.focus
;
quit;
Where (X=?) Y+1 code:
data tables.focus;
set tables.focus_;
by verdelingsoptie;
retain
Fast_Track_1_erfgenaam
Fast_Track_Leefgeld
Fast_Track_Saldo
Fast_Track_Sterkmaking
Manuele_verdeelopdracht_kantoor
Verdeling_notaris
Verdeling_standaard
;
if first.verdelingsoptie
then do;
Fast_Track_1_erfgenaam=0;
Fast_Track_Leefgeld=0;
Fast_Track_Saldo=0;
Fast_Track_Sterkmaking=0;
Manuele_verdeelopdracht_kantoor=0;
Verdeling_notaris=0;
Verdeling_standaard=0;
end;
select (verdelignsoptie);
when ('Fast Track 1 erfgenaam') Fast_Track_1_erfgenaam +1;
when ('Fast Track Leefgeld') Fast_Track_Leefgeld +1;
when ('Fast Track Saldo') Fast_Track_Saldo +1;
when ('Fast Track Sterkmaking') Fast_Track_Sterkmaking +1;
when ('Manuele verdeelopdracht kantoor') Manuele_verdeelopdracht_kantoor +1;
when ('Verdeling notaris') Verdeling_notaris +1;
when ('Verdeling standaard') Verdeling_standaard +1;
end;
if last.verdelingsoptie then output;
keep Fast_Track_1_erfgenaam Fast_Track_Leefgeld Fast_Track_Saldo Fast_Track_Sterkmaking Manuele_verdeelopdracht_kantoor Verdeling_notaris Verdeling_standaard;
run;
The count/distinct script results in a table that says there are 7 distributionoptions.
The where (X=?) Y+1 results in some errors:
NOTE: Character values have been converted to numeric values at the places given by:
(Line):(Column).
528:7 529:7 530:7 531:7 532:7 533:7 534:7
NOTE: Variable verdelignsoptie is uninitialized.
NOTE: Variable first.verdelingsoptie is uninitialized.
NOTE: Variable last.verdelingsoptie is uninitialized.
NOTE: Invalid numeric data, 'Fast Track 1 erfgenaam' , at line 528 column 7.
_TEMG002=455 Fast_Track_1_erfgenaam=1 Fast_Track_Leefgeld=. Fast_Track_Saldo=.
Fast_Track_Sterkmaking=. Manuele_verdeelopdracht_kantoor=. Verdeling_notaris=.
Verdeling_standaard=. first.verdelingsoptie=0 verdelignsoptie=. last.verdelingsoptie=0 _ERROR_=1
_N_=1
NOTE: There were 1 observations read from the data set TABLES.FOCUS_.
NOTE: The data set TABLES.FOCUS has 0 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds
Sorry, you haven't used my code. The key part of the SQL is the group by:
proc sql; create table TABLES.FOCUS_ as select VERDELINGSOPTIE, /* <= added, as we are grouping by */
count(*) as COUNT from TABLES.FOCUS group by VERDELINGSOPTIE; /* <--- this part here is important! */ quit;
In SQL I would do something like this:
select distributionoptions, count (*)
from tables.focus
group by distributionoptions
save as tables.focus_
Hi,
For your first code:
proc sql; create table TABLES.FOCUS_ as select count(*) as COUNT from TABLES.FOCUS group by DISTRIBUTIONOPTION; quit;
Should work fine. I note the code you post, doesn't match the names given in the description, so I used the description variable names. In your second code, you have:
by verdelingsoptie;
Which is not in that dataset, which is why you have problems. Could it be that you meant to use focus dataset rather than focus_? Although, in the description again, that variable is not mentioned - columns are:
dossier_manager | r_creation_date | distributionoption |
The table I start from is called "focus" in my "tables" library (tables.focus).
I want to perform the count on the variable "verdelingsoptie" (distributionoption in English).
After the count is complete, I want to store the new data in a table called "focus_" in my "tables" library (tables.focus_).
If I use your suggestion, which I tried also, this is the result:
proc sql;
create table tables.focus_ as
select count(*) as count
from tables.focus
order by verdelingsoptie
;
quit;
Result:
count |
455 |
455 |
455 |
455 |
455 |
455 |
455 |
455 |
455 |
455 |
And that for 455 rows, because that's how many rows of data "tables.focus" contains.
Sorry, you haven't used my code. The key part of the SQL is the group by:
proc sql; create table TABLES.FOCUS_ as select VERDELINGSOPTIE, /* <= added, as we are grouping by */
count(*) as COUNT from TABLES.FOCUS group by VERDELINGSOPTIE; /* <--- this part here is important! */ quit;
Mondays cause me to be like "order by" = "group by". After changing that, it did work like you said. Thanks a dozen! 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.