Hi
I have a big dataset from which I want a specific subsample.
The dataset contains workshift data from employees from approx 100 different companies, which have different number of employees. The dataset can include selveral observations/datalines for each person.
I want a subsample that consist of all the observations for 10% of the 100 companies (and not 10% of the number of observations/datalines), where the 10% of the companies is randomly chosen. (I'm aware that this means, that the subsample could be of different size depending on which companies, that are chosen).
Anyone who has a suggest how to do this?
PROC SURVEYSELECT with the STRATA statement can do this.
Hi @Biniie,
The CLUSTER statement (alias: SAMPLINGUNIT) is ideal for your purpose.
Example:
proc surveyselect data=workshiftdata
method=srs samprate=10
seed=2718 out=subsample;
cluster company;
run;
(Just replace workshiftdata, subsample and company by your input dataset, output dataset and variable names, respectively.)
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.