Hi
I have a big dataset from which I want a specific subsample.
The dataset contains workshift data from employees from approx 100 different companies, which have different number of employees. The dataset can include selveral observations/datalines for each person.
I want a subsample that consist of all the observations for 10% of the 100 companies (and not 10% of the number of observations/datalines), where the 10% of the companies is randomly chosen. (I'm aware that this means, that the subsample could be of different size depending on which companies, that are chosen).
Anyone who has a suggest how to do this?
PROC SURVEYSELECT with the STRATA statement can do this.
Hi @Biniie,
The CLUSTER statement (alias: SAMPLINGUNIT) is ideal for your purpose.
Example:
proc surveyselect data=workshiftdata
method=srs samprate=10
seed=2718 out=subsample;
cluster company;
run;
(Just replace workshiftdata, subsample and company by your input dataset, output dataset and variable names, respectively.)
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.