I need to subsample a dataset within groups. Here is an example- two columns.
Group Data
G1 v1
G1 v2
G1 v3
G1 v4
G2 v5
G2 v6
G3 v7
G3 v8
G3 v9
G3 v10
G3 v11
I actually have a lot more data points for some groups. For the groups with more than 2 (for example), I want to only sample 60% of the values from that group in a subset.
How do I accomplish that?
You could use proc surveyselect and specify your sample sizes in a secondary dataset
proc sql;
create table sampleSize as
select
group,
max(2, 0.6*count(*)) as _nsize_
from myData
group by group;
quit;
proc surveyselect data=myData selectall out=mySample sampsize=sampleSize;
strata group;
run;
(untested)
You could use proc surveyselect and specify your sample sizes in a secondary dataset
proc sql;
create table sampleSize as
select
group,
max(2, 0.6*count(*)) as _nsize_
from myData
group by group;
quit;
proc surveyselect data=myData selectall out=mySample sampsize=sampleSize;
strata group;
run;
(untested)
Thank you! The only thing missing was round. Easily added!
round(max(2, 0.6*count(*)))
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.