BookmarkSubscribeRSS Feed
mahim
Calcite | Level 5
I am trying to:
1. Sample repeated measures data based on subject IDs for cross validation of a GENMOD model,
2. Bootstrap a dataset because of a highly skewed outcome measure recorded in an unbalanced long format (e.g a patient may may have 5 observations for t1, 0 for t2, 3 for t3, and so on, where 't's are time points).
 
Is there a well explained macro/guide available for resampling or bootstrapping repeated measures data based on subject IDs? I want to resample by ID so that all observations per subject are captured by the procedure in the train/test sets. Please do correct me if I am wrong; I believe simple random sampling by replacement, which may be done using proc surveyselect, would not be an efficient method for repeated measures data. How can I modify proc surveyselect to incorporate subject IDs?
2 REPLIES 2
PGStats
Opal | Level 21

@Rick_SAS has posted many great entries on Bootstrap. Look here

 

https://blogs.sas.com/content/?s=bootstrap

 

To make the subject ID the (re-)sampling unit, go for cluster sampling, where the subject is the cluster.

PG
fbaumer
Calcite | Level 5

Hi - I have this exact same question. I've been looking through the recommended link and have not found a clear answer in the documentation. Did you ever find a way to do this? I have done a year of masters' level statistics so do rely on very specific instructions if I try one of these models (rather than extrapolating from related things like proc mixed).

 

Thank you!