BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ZZ_Zheng
Calcite | Level 5

Hi, I hope I would not confuse those without domain knowledge.

 

I want to bootstrap 100 data sets with replacement and fit a random effect model in each bootstrap sample. 

 

Below is the data set I would use to resample, each unique "study_id" represent a subject, each subject have 8 records("quarter").

image.png

 

 

Then I use proc surveyselect and specify the sample unique = "study_id"

%let NumSamples = 10;       /* number of bootstrap resamples */
/* 2. Generate many bootstrap samples */
proc surveyselect data=origin seed=345
     out=Bootsample1(rename=(Replicate=SampleID))
     method=urs              /* resample with replacement */
     samprate=1              /* each bootstrap sample has N observations */
      OUTHITS                /* option to suppress the frequency var */
     reps=&NumSamples;       /* generate NumSamples bootstrap resamples 426*/
	samplingunit study_id;
run;

 

 

After resampling, we see that study_id 10000 was selected twice, he has 2*8=16 observations.

 

image.png

 

 

The final step is to fit a random effect model with a random intercept for each study_id, which require to specify the cluster variables, It requires study_id 10000 was account as two subjects with the same 8 observations and sample results, rather than one subjects with 16 observations. For example, treat the second 8-duplicate-records of "study_id" 10000 as another different "study_id" 10000A and keep the first "study_id" 10000 then we have two study_id 10000 and 10000A that have same observations and same results.

 

 

proc glimmix data=zheng_no0  method=quad(qpoints=10);  /*edcn  all factors latent class risk_score*/
	class study_id pred3class(ref='2')  gender age_cat(ref='1') ge_12mo_flag(ref='0');
	effect spl = spline(log_risk/naturalcubic degree=3 knotmethod=percentiles(3));
	model pct_sum = pred3class gender age_cat spl ge_12mo_flag/link=log s dist=poisson offset=logt;
	random int / subject=study_id;
run;

 

Is there any way I could just change the study_id? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Add a data step that manipulates study_id. Since quarter is sorted in ascending order, we can use it to detect when a second hit starts:

data origin;
length study_id $10;
input study_id quarter;
datalines;
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
10 1
10 2
10 3
10 4
10 5
10 6
10 7
10 8
100 1
100 2
100 3
100 4
100 5
100 6
100 7
100 8
1000 1
1000 2
1000 3
1000 4
1000 5
1000 6
1000 7
1000 8
1001 1
1001 2
1001 3
1001 4
1001 5
1001 6
1001 7
1001 8
;
run;
%let NumSamples = 10;       /* number of bootstrap resamples */
/* 2. Generate many bootstrap samples */
proc surveyselect data=origin seed=345
     out=Bootsample1(rename=(Replicate=SampleID))
     method=urs              /* resample with replacement */
     samprate=1              /* each bootstrap sample has N observations */
      OUTHITS                /* option to suppress the frequency var */
     reps=&NumSamples;       /* generate NumSamples bootstrap resamples 426*/
	samplingunit study_id;
run;

data bootsample1_a;
set bootsample1;
retain addstring ' abcdefghij';
by sampleid study_id;
l_q = lag(quarter);
if first.study_id
then i = 1;
else if l_q > quarter then i + 1;
study_id = cats(study_id,substr(addstring,i,1));
drop addstring i l_q;
run;

Partial output:

 9        1        100          1          3
10        1        100          2          3
11        1        100          3          3
12        1        100          4          3
13        1        100          5          3
14        1        100          6          3
15        1        100          7          3
16        1        100          8          3
17        1        100a         1          3
18        1        100a         2          3
19        1        100a         3          3
20        1        100a         4          3
21        1        100a         5          3
22        1        100a         6          3
23        1        100a         7          3
24        1        100a         8          3
25        1        100b         1          3
26        1        100b         2          3
27        1        100b         3          3
28        1        100b         4          3
29        1        100b         5          3
30        1        100b         6          3
31        1        100b         7          3
32        1        100b         8          3

View solution in original post

5 REPLIES 5
ZZ_Zheng
Calcite | Level 5

image.pngs

 

This is the original data set used to resample, resample unit is the study_id, or resample a cluster of 8 observations of sample study_id. Thanks!!

Kurt_Bremser
Super User

Please DO NOT post data as pictures. Nobody here likes to waste time tediously typing things off a screesnshot and correcting typos all the time.

Always post SAS datasets as data steps with datalines, so we can create an exact replica of your dataset (inlcuding all lengths, formats, etc.) for testing.

ZZ_Zheng
Calcite | Level 5
data origin;
length study_id $10;
input study_id quarter;
datalines;
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
10 1
10 2
10 3
10 4
10 5
10 6
10 7
10 8
100 1
100 2
100 3
100 4
100 5
100 6
100 7
100 8
1000 1
1000 2
1000 3
1000 4
1000 5
1000 6
1000 7
1000 8
1001 1
1001 2
1001 3
1001 4
1001 5
1001 6
1001 7
1001 8
;
run;

Hi @Kurt_Bremser and @All. Sorry, I should create a simple sample data, my raw data is large and credential and I using sas enterprise under the off-line system. Thanks!

Kurt_Bremser
Super User

Add a data step that manipulates study_id. Since quarter is sorted in ascending order, we can use it to detect when a second hit starts:

data origin;
length study_id $10;
input study_id quarter;
datalines;
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
10 1
10 2
10 3
10 4
10 5
10 6
10 7
10 8
100 1
100 2
100 3
100 4
100 5
100 6
100 7
100 8
1000 1
1000 2
1000 3
1000 4
1000 5
1000 6
1000 7
1000 8
1001 1
1001 2
1001 3
1001 4
1001 5
1001 6
1001 7
1001 8
;
run;
%let NumSamples = 10;       /* number of bootstrap resamples */
/* 2. Generate many bootstrap samples */
proc surveyselect data=origin seed=345
     out=Bootsample1(rename=(Replicate=SampleID))
     method=urs              /* resample with replacement */
     samprate=1              /* each bootstrap sample has N observations */
      OUTHITS                /* option to suppress the frequency var */
     reps=&NumSamples;       /* generate NumSamples bootstrap resamples 426*/
	samplingunit study_id;
run;

data bootsample1_a;
set bootsample1;
retain addstring ' abcdefghij';
by sampleid study_id;
l_q = lag(quarter);
if first.study_id
then i = 1;
else if l_q > quarter then i + 1;
study_id = cats(study_id,substr(addstring,i,1));
drop addstring i l_q;
run;

Partial output:

 9        1        100          1          3
10        1        100          2          3
11        1        100          3          3
12        1        100          4          3
13        1        100          5          3
14        1        100          6          3
15        1        100          7          3
16        1        100          8          3
17        1        100a         1          3
18        1        100a         2          3
19        1        100a         3          3
20        1        100a         4          3
21        1        100a         5          3
22        1        100a         6          3
23        1        100a         7          3
24        1        100a         8          3
25        1        100b         1          3
26        1        100b         2          3
27        1        100b         3          3
28        1        100b         4          3
29        1        100b         5          3
30        1        100b         6          3
31        1        100b         7          3
32        1        100b         8          3

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1023 views
  • 0 likes
  • 2 in conversation