DATA Step, Macro, Functions and more

Subsampling within Group

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

Subsampling within Group

I need to subsample a dataset within groups. Here is an example- two columns.

 

Group       Data

G1             v1

G1             v2

G1             v3

G1             v4

G2             v5

G2             v6

G3             v7

G3             v8

G3             v9

G3             v10

G3             v11

 

I actually have a lot more data points for some groups. For the groups with more than 2 (for example), I want to only sample 60% of the values from that group in a subset.

 

How do I accomplish that?


Accepted Solutions
Solution
‎03-13-2018 10:03 AM
Esteemed Advisor
Posts: 5,474

Re: Subsampling within Group

You could use proc surveyselect and specify your sample sizes in a secondary dataset

 

proc sql;
create table sampleSize as
select 
    group,
    max(2, 0.6*count(*)) as _nsize_
from myData
group by group;
quit;

proc surveyselect data=myData selectall out=mySample sampsize=sampleSize;
strata group;
run;

(untested)

 

PG

View solution in original post


All Replies
Solution
‎03-13-2018 10:03 AM
Esteemed Advisor
Posts: 5,474

Re: Subsampling within Group

You could use proc surveyselect and specify your sample sizes in a secondary dataset

 

proc sql;
create table sampleSize as
select 
    group,
    max(2, 0.6*count(*)) as _nsize_
from myData
group by group;
quit;

proc surveyselect data=myData selectall out=mySample sampsize=sampleSize;
strata group;
run;

(untested)

 

PG
Contributor
Posts: 23

Re: Subsampling within Group

Thank you! The only thing missing was round. Easily added!

 

round(max(2, 0.6*count(*)))

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 106 views
  • 1 like
  • 2 in conversation