BookmarkSubscribeRSS Feed
Biniie
Fluorite | Level 6

Hi

I have a big dataset from which I want a specific subsample. 

 

The dataset contains workshift data from employees from approx 100 different companies, which have different number of employees. The dataset can include selveral observations/datalines for each person.

 

I want a subsample that consist of all the observations for 10% of the 100 companies (and not 10% of the number of observations/datalines), where the 10% of the companies is randomly chosen. (I'm aware that this means, that the subsample could be of different size depending on which companies, that are chosen). 

 

Anyone who has a suggest how to do this?

 

 

 

2 REPLIES 2
PaigeMiller
Diamond | Level 26

PROC SURVEYSELECT with the STRATA statement can do this.

--
Paige Miller
FreelanceReinh
Jade | Level 19

Hi @Biniie,

 

The CLUSTER statement (alias: SAMPLINGUNIT) is ideal for your purpose.

 

Example:

proc surveyselect data=workshiftdata
method=srs samprate=10
seed=2718 out=subsample;
cluster company;
run;

(Just replace workshiftdata, subsample and company by your input dataset, output dataset and variable names, respectively.)

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 588 views
  • 1 like
  • 3 in conversation