BookmarkSubscribeRSS Feed
Biniie
Fluorite | Level 6

Hi

I have a big dataset from which I want a specific subsample. 

 

The dataset contains workshift data from employees from approx 100 different companies, which have different number of employees. The dataset can include selveral observations/datalines for each person.

 

I want a subsample that consist of all the observations for 10% of the 100 companies (and not 10% of the number of observations/datalines), where the 10% of the companies is randomly chosen. (I'm aware that this means, that the subsample could be of different size depending on which companies, that are chosen). 

 

Anyone who has a suggest how to do this?

 

 

 

2 REPLIES 2
PaigeMiller
Diamond | Level 26

PROC SURVEYSELECT with the STRATA statement can do this.

--
Paige Miller
FreelanceReinh
Jade | Level 19

Hi @Biniie,

 

The CLUSTER statement (alias: SAMPLINGUNIT) is ideal for your purpose.

 

Example:

proc surveyselect data=workshiftdata
method=srs samprate=10
seed=2718 out=subsample;
cluster company;
run;

(Just replace workshiftdata, subsample and company by your input dataset, output dataset and variable names, respectively.)

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 471 views
  • 1 like
  • 3 in conversation