BookmarkSubscribeRSS Feed
Biniie
Fluorite | Level 6

Hi

I have a big dataset from which I want a specific subsample. 

 

The dataset contains workshift data from employees from approx 100 different companies, which have different number of employees. The dataset can include selveral observations/datalines for each person.

 

I want a subsample that consist of all the observations for 10% of the 100 companies (and not 10% of the number of observations/datalines), where the 10% of the companies is randomly chosen. (I'm aware that this means, that the subsample could be of different size depending on which companies, that are chosen). 

 

Anyone who has a suggest how to do this?

 

 

 

2 REPLIES 2
PaigeMiller
Diamond | Level 26

PROC SURVEYSELECT with the STRATA statement can do this.

--
Paige Miller
FreelanceReinh
Jade | Level 19

Hi @Biniie,

 

The CLUSTER statement (alias: SAMPLINGUNIT) is ideal for your purpose.

 

Example:

proc surveyselect data=workshiftdata
method=srs samprate=10
seed=2718 out=subsample;
cluster company;
run;

(Just replace workshiftdata, subsample and company by your input dataset, output dataset and variable names, respectively.)

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 383 views
  • 1 like
  • 3 in conversation