Hello everyone !
My data looks like this :
occupation | employer | qualification |
dentist | Private | 9 |
construction worker | Private | 2 |
construction worker | Public | 2 |
dentist | Private | 9 |
dentist | Public | 8 |
zoologist | Public | 8 |
dentist | Public | 6 |
dentist | Public | 6 |
police officer | Public | 4 |
police officer | Public | 4 |
police officer | Public | 4 |
police officer | Public | 4 |
I want to have the modes of the qualification variable for each occupation/employer couple. I'd also like to have counts for the modes. If there is an equality, the lowest qualification would be selected (lowest is 1 and highest is 9).
Result would be something like this :
occupation | employer | qualification | count |
dentist | Private | 9 | 2 |
dentist | Public | 8 | 1 |
dentist | Public | 6 | 2 |
construction worker | Private | 2 | 1 |
construction worker | Public | 2 | 1 |
zoologist | Public | 8 | 1 |
police officer | Public | 4 | 4 |
My data has 1 million observations, so an sql approach is probably best.
data have;
input occupation & $25. employer :$20. qualification ;
cards;
dentist Private 9
construction worker Private 2
construction worker Public 2
dentist Private 9
dentist Public 8
zoologist Public 8
dentist Public 6
dentist Public 6
police officer Public 4
police officer Public 4
police officer Public 4
police officer Public 4
;
proc sql;
create table want as
select occupation,employer,qualification,count( qualification ) as count
from have
group by occupation,employer,qualification;
quit;
To get counts (before limiting results to modes):
proc freq data=have;
tables occupation * employer * qualification / out=want (drop=percent);
run;
I already tried this but it takes hours before SAS finally crash down.
It's difficult to believe that would happen with only 1M observations, but there is a workaround:
proc sort data=have;
by occupation;
run;
proc freq data=have;
by occupation;
tables employer * qualification / out=want (drop=percent);
run;
I think it's because at work we have to run SAS on some crappy virtual machine. I always have this problem when I run proc freq on 1M+ tables.
Ok so now I've got the counts but no way to find how to get the modes on SAS. Proc univariate only allows you to have the mode of one variable, not the modes of occupation*employer*qualification.
Any idea ?
If your data set holding the counts is named COUNTS, you would just need to sort and subset:
proc sort data=counts;
by occupation employer descending count qualification;
run;
data want;
set counts;
by occupation employer descending count qualification;
if first.count;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.