BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
abak
Obsidian | Level 7

I need to subsample a dataset within groups. Here is an example- two columns.

 

Group       Data

G1             v1

G1             v2

G1             v3

G1             v4

G2             v5

G2             v6

G3             v7

G3             v8

G3             v9

G3             v10

G3             v11

 

I actually have a lot more data points for some groups. For the groups with more than 2 (for example), I want to only sample 60% of the values from that group in a subset.

 

How do I accomplish that?

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You could use proc surveyselect and specify your sample sizes in a secondary dataset

 

proc sql;
create table sampleSize as
select 
    group,
    max(2, 0.6*count(*)) as _nsize_
from myData
group by group;
quit;

proc surveyselect data=myData selectall out=mySample sampsize=sampleSize;
strata group;
run;

(untested)

 

PG

View solution in original post

2 REPLIES 2
PGStats
Opal | Level 21

You could use proc surveyselect and specify your sample sizes in a secondary dataset

 

proc sql;
create table sampleSize as
select 
    group,
    max(2, 0.6*count(*)) as _nsize_
from myData
group by group;
quit;

proc surveyselect data=myData selectall out=mySample sampsize=sampleSize;
strata group;
run;

(untested)

 

PG
abak
Obsidian | Level 7

Thank you! The only thing missing was round. Easily added!

 

round(max(2, 0.6*count(*)))

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1362 views
  • 1 like
  • 2 in conversation