BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
agewell150
Fluorite | Level 6

I have a large longitudinal dataset and would like to use surveyselect to randomly select a subsample of 5%.

The catch is that I need to select all measures for each client ID, these vary from 1 to 10 measurement occasions.

 

After an extensive online search I have not been able to find an example.

 

Here is the syntax I have been working with.

 

proc surveyselect data = have out= want
method=srs samprate=.05
by ID;
run;

 

Insights would be greatly appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @agewell150 and welcome to the SAS Support Communities!

 

Replace the BY statement (which PROC SURVEYSELECT treats like a STRATA statement) with a CLUSTER statement:

cluster ID;

(and insert the missing semicolon after "...=.05" to terminate the PROC SURVEYSELECT statement).

 

Edit: I would also recommend using the SEED= option of the PROC SURVEYSELECT statement so that you can replicate your results.

View solution in original post

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @agewell150 and welcome to the SAS Support Communities!

 

Replace the BY statement (which PROC SURVEYSELECT treats like a STRATA statement) with a CLUSTER statement:

cluster ID;

(and insert the missing semicolon after "...=.05" to terminate the PROC SURVEYSELECT statement).

 

Edit: I would also recommend using the SEED= option of the PROC SURVEYSELECT statement so that you can replicate your results.

agewell150
Fluorite | Level 6

Thanks -- that worked!