Hello everyone !
My data looks like this :
occupation | employer | qualification |
dentist | Private | 9 |
construction worker | Private | 2 |
construction worker | Public | 2 |
dentist | Private | 9 |
dentist | Public | 8 |
zoologist | Public | 8 |
dentist | Public | 6 |
dentist | Public | 6 |
police officer | Public | 4 |
police officer | Public | 4 |
police officer | Public | 4 |
police officer | Public | 4 |
I want to have the modes of the qualification variable for each occupation/employer couple. I'd also like to have counts for the modes. If there is an equality, the lowest qualification would be selected (lowest is 1 and highest is 9).
Result would be something like this :
occupation | employer | qualification | count |
dentist | Private | 9 | 2 |
dentist | Public | 8 | 1 |
dentist | Public | 6 | 2 |
construction worker | Private | 2 | 1 |
construction worker | Public | 2 | 1 |
zoologist | Public | 8 | 1 |
police officer | Public | 4 | 4 |
My data has 1 million observations, so an sql approach is probably best.
data have;
input occupation & $25. employer :$20. qualification ;
cards;
dentist Private 9
construction worker Private 2
construction worker Public 2
dentist Private 9
dentist Public 8
zoologist Public 8
dentist Public 6
dentist Public 6
police officer Public 4
police officer Public 4
police officer Public 4
police officer Public 4
;
proc sql;
create table want as
select occupation,employer,qualification,count( qualification ) as count
from have
group by occupation,employer,qualification;
quit;
To get counts (before limiting results to modes):
proc freq data=have;
tables occupation * employer * qualification / out=want (drop=percent);
run;
I already tried this but it takes hours before SAS finally crash down.
It's difficult to believe that would happen with only 1M observations, but there is a workaround:
proc sort data=have;
by occupation;
run;
proc freq data=have;
by occupation;
tables employer * qualification / out=want (drop=percent);
run;
I think it's because at work we have to run SAS on some crappy virtual machine. I always have this problem when I run proc freq on 1M+ tables.
Ok so now I've got the counts but no way to find how to get the modes on SAS. Proc univariate only allows you to have the mode of one variable, not the modes of occupation*employer*qualification.
Any idea ?
If your data set holding the counts is named COUNTS, you would just need to sort and subset:
proc sort data=counts;
by occupation employer descending count qualification;
run;
data want;
set counts;
by occupation employer descending count qualification;
if first.count;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.