01-31-2016 08:15 AM
I have a dataset of subjects and their covariates, over time (for each subject I have multiple rows representing multiple time points). I wish to perform a bootstraping with replacement, i.e., if my current N is 20 subjects, I wish to create a file of 100 subjects, while keeping all their characteristics. This data will then be used in some model. Since modelling is the purpose, I need to get a column of unique ID's. Example: If subject #1 is chosen 3 times (and I have 5 time points), then I will have 15 rows with ID = 1, while I need 3 times 5 rows with uniqe ID's, such as 1a, 1b and 1c (of course the coding is not important to me, it can be anything).
My current code is:
proc surveyselect data = original_data method = urs sampsize = 100 rep = 1 seed = 12345 out = Sample_WR; id _all_; samplingunit ID; run;
I tried looking at the help of the procedure but didn't find it. How do I keep unique ID's of my samples, rather than my subject ID's ?
What I basically need is a column counting the samples: sample1, sample2, ....sample100.
Thank you !
01-31-2016 08:51 AM
I think I may undstand what you want and as far as I can tell you will need to create the variable.
proc surveyselect data = sashelp.shoes method=urs sampsize=50 rep=1 seed=12345 out=Sample_WR; id _all_; samplingunit region Subsidiary; run; data sample_wr; set sample_wr; do sampleUnitID=1 to numberhits; output; end; run; proc sort data=sample_wr; by Replicate region Subsidiary sampleUnitID; run; data sample_wr; set sample_wr; by Replicate region Subsidiary sampleunitid; if first.Replicate then sampleID = 0; if first.sampleunitid then sampleID + 1; run; proc print; where numberhits gt 1; run;
01-31-2016 06:05 PM
Here is a straitforward approach:
/* example data */ data test; call streaminit(12345); do id = 1 to 5; do t = 1 to rand("Poisson", 4); x = rand("NORMAL"); output; end; end; drop t; run; /* Add variable n = cluster size */ proc sql; create table test0 as select *, count(*) as n from test group by id; quit; /* Select sample with replacement */ proc surveyselect data=test0 out=sample0 method=urs sampsize=10 outhits seed=54321; cluster id; id n x; run; /* Generate new IDs */ data sample; set sample0; by id; if first.id then i = 0; i + 1; newId + mod(i, n) = 1; drop i id n numberHits; run;