BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
abak
Obsidian | Level 7

I need to subsample a dataset within groups. Here is an example- two columns.

 

Group       Data

G1             v1

G1             v2

G1             v3

G1             v4

G2             v5

G2             v6

G3             v7

G3             v8

G3             v9

G3             v10

G3             v11

 

I actually have a lot more data points for some groups. For the groups with more than 2 (for example), I want to only sample 60% of the values from that group in a subset.

 

How do I accomplish that?

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

You could use proc surveyselect and specify your sample sizes in a secondary dataset

 

proc sql;
create table sampleSize as
select 
    group,
    max(2, 0.6*count(*)) as _nsize_
from myData
group by group;
quit;

proc surveyselect data=myData selectall out=mySample sampsize=sampleSize;
strata group;
run;

(untested)

 

PG

View solution in original post

2 REPLIES 2
PGStats
Opal | Level 21

You could use proc surveyselect and specify your sample sizes in a secondary dataset

 

proc sql;
create table sampleSize as
select 
    group,
    max(2, 0.6*count(*)) as _nsize_
from myData
group by group;
quit;

proc surveyselect data=myData selectall out=mySample sampsize=sampleSize;
strata group;
run;

(untested)

 

PG
abak
Obsidian | Level 7

Thank you! The only thing missing was round. Easily added!

 

round(max(2, 0.6*count(*)))

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 799 views
  • 1 like
  • 2 in conversation