Re: Saving Unique ID's from a Bootstrap sample

BlueNose · Posted 01-31-2016 08:15 AM

Hello all,

I have a dataset of subjects and their covariates, over time (for each subject I have multiple rows representing multiple time points). I wish to perform a bootstraping with replacement, i.e., if my current N is 20 subjects, I wish to create a file of 100 subjects, while keeping all their characteristics. This data will then be used in some model. Since modelling is the purpose, I need to get a column of unique ID's. Example: If subject #1 is chosen 3 times (and I have 5 time points), then I will have 15 rows with ID = 1, while I need 3 times 5 rows with uniqe ID's, such as 1a, 1b and 1c (of course the coding is not important to me, it can be anything).

My current code is:

proc surveyselect data = original_data method = urs sampsize = 100 rep = 1 seed = 12345 out = Sample_WR;
id _all_;
samplingunit ID;
run;

I tried looking at the help of the procedure but didn't find it. How do I keep unique ID's of my samples, rather than my subject ID's ?

What I basically need is a column counting the samples: sample1, sample2, ....sample100.

Thank you !

data_null__ · Posted 01-31-2016 08:51 AM

I think I may undstand what you want and as far as I can tell you will need to create the variable.

proc surveyselect data = sashelp.shoes method=urs sampsize=50 rep=1 seed=12345 out=Sample_WR;
   id _all_;
   samplingunit region Subsidiary;
   run;
data sample_wr;
   set sample_wr;
   do sampleUnitID=1 to numberhits;
      output;
      end;
   run;
proc sort data=sample_wr;
   by Replicate region Subsidiary sampleUnitID;
   run;
data sample_wr;
   set sample_wr;
   by Replicate region Subsidiary sampleunitid;
   if first.Replicate then sampleID = 0;
   if first.sampleunitid then sampleID + 1;
   run;
proc print;
   where numberhits gt 1;
   run;

PGStats · Posted 01-31-2016 06:05 PM

Here is a straitforward approach:

/* example data */ 
data test;
call streaminit(12345);
do id = 1 to 5;
    do t = 1 to rand("Poisson", 4);
        x = rand("NORMAL");
        output;
        end;
    end;
drop t;
run;

/* Add variable n = cluster size */
proc sql;
create table test0 as
select *, count(*) as n  
from test
group by id;
quit;

/* Select sample with replacement */
proc surveyselect data=test0 out=sample0 method=urs sampsize=10 outhits seed=54321;
cluster id;
id n x;
run;

/* Generate new IDs */
data sample;
set sample0; by id;
if first.id then i = 0;
i + 1;
newId + mod(i, n) = 1;
drop i id n numberHits;
run;

PG