BookmarkSubscribeRSS Feed
alex_philby
Obsidian | Level 7

Hello everyone !

 

My data looks like this :

 

occupationemployerqualification
dentistPrivate9
construction workerPrivate2
construction workerPublic2
dentistPrivate9
dentistPublic8
zoologistPublic8
dentistPublic6
dentistPublic6
police officerPublic4
police officerPublic4
police officerPublic4
police officerPublic4

 

I want to have the modes of the qualification variable for each occupation/employer couple. I'd also like to have counts for the modes. If there is an equality, the lowest qualification would be selected (lowest is 1 and highest is 9).


Result would be something like this :

occupationemployerqualificationcount
dentistPrivate92
dentist Public81
dentist Public62
construction workerPrivate21
construction workerPublic21
zoologistPublic81
police officerPublic44

 

My data has 1 million observations, so an sql approach is probably best.

6 REPLIES 6
novinosrin
Tourmaline | Level 20
data have;
input occupation & $25.	employer :$20.	qualification ;	
cards;
dentist		Private	9
construction worker		Private	2
construction worker		Public	2
dentist		Private	9
dentist		Public	8
zoologist	Public	8
dentist		Public	6
dentist		Public	6
police officer		Public	4
police officer		Public	4
police officer		Public	4
police officer		Public	4
;
proc sql;
create table want as
select   occupation,employer,qualification,count( qualification ) as count
from have
group by occupation,employer,qualification;
quit;
Astounding
PROC Star

To get counts (before limiting results to modes):

 

proc freq data=have;

tables occupation * employer * qualification / out=want (drop=percent);

run;

alex_philby
Obsidian | Level 7

I already tried this but it takes hours before SAS finally crash down.

Astounding
PROC Star

It's difficult to believe that would happen with only 1M observations, but there is a workaround:

 

proc sort data=have;

by occupation;

run;

proc freq data=have;

by occupation;

tables employer * qualification / out=want (drop=percent);

run;

alex_philby
Obsidian | Level 7

I think it's because at work we have to run SAS on some crappy virtual machine. I always have this problem when I run proc freq on 1M+ tables.

Ok so now I've got the counts but no way to find how to get the modes on SAS. Proc univariate only allows you to have the mode of one variable, not the modes of occupation*employer*qualification.

Any idea ?

Astounding
PROC Star

If your data set holding the counts is named COUNTS, you would just need to sort and subset:

 

proc sort data=counts;

   by occupation employer descending count qualification;

run;

 

data want;

set counts;

by occupation employer descending count qualification;

if first.count;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 751 views
  • 0 likes
  • 3 in conversation