BookmarkSubscribeRSS Feed
alex_philby
Obsidian | Level 7

Hello everyone !

 

My data looks like this :

 

occupationemployerqualification
dentistPrivate9
construction workerPrivate2
construction workerPublic2
dentistPrivate9
dentistPublic8
zoologistPublic8
dentistPublic6
dentistPublic6
police officerPublic4
police officerPublic4
police officerPublic4
police officerPublic4

 

I want to have the modes of the qualification variable for each occupation/employer couple. I'd also like to have counts for the modes. If there is an equality, the lowest qualification would be selected (lowest is 1 and highest is 9).


Result would be something like this :

occupationemployerqualificationcount
dentistPrivate92
dentist Public81
dentist Public62
construction workerPrivate21
construction workerPublic21
zoologistPublic81
police officerPublic44

 

My data has 1 million observations, so an sql approach is probably best.

6 REPLIES 6
novinosrin
Tourmaline | Level 20
data have;
input occupation & $25.	employer :$20.	qualification ;	
cards;
dentist		Private	9
construction worker		Private	2
construction worker		Public	2
dentist		Private	9
dentist		Public	8
zoologist	Public	8
dentist		Public	6
dentist		Public	6
police officer		Public	4
police officer		Public	4
police officer		Public	4
police officer		Public	4
;
proc sql;
create table want as
select   occupation,employer,qualification,count( qualification ) as count
from have
group by occupation,employer,qualification;
quit;
Astounding
PROC Star

To get counts (before limiting results to modes):

 

proc freq data=have;

tables occupation * employer * qualification / out=want (drop=percent);

run;

alex_philby
Obsidian | Level 7

I already tried this but it takes hours before SAS finally crash down.

Astounding
PROC Star

It's difficult to believe that would happen with only 1M observations, but there is a workaround:

 

proc sort data=have;

by occupation;

run;

proc freq data=have;

by occupation;

tables employer * qualification / out=want (drop=percent);

run;

alex_philby
Obsidian | Level 7

I think it's because at work we have to run SAS on some crappy virtual machine. I always have this problem when I run proc freq on 1M+ tables.

Ok so now I've got the counts but no way to find how to get the modes on SAS. Proc univariate only allows you to have the mode of one variable, not the modes of occupation*employer*qualification.

Any idea ?

Astounding
PROC Star

If your data set holding the counts is named COUNTS, you would just need to sort and subset:

 

proc sort data=counts;

   by occupation employer descending count qualification;

run;

 

data want;

set counts;

by occupation employer descending count qualification;

if first.count;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 790 views
  • 0 likes
  • 3 in conversation