BookmarkSubscribeRSS Feed
mili
Fluorite | Level 6

 

Hello, 

I have 5 imputated datasets (imput j = 1 to 5), for each one, I'd like to do this 200 times (i = 1 to 200)

1) resample with replacement the same number of observations as in the original imputated dataset 
 
2) in this bootstrap dataset, run a cox proportional regression with backward selection
      proc phreg data = boot&i;
    model survival * censored (1) = R S T U V W X Y Z  / selection = backward;
ods output parameterestimates = estim&i ;
run;
 
3) the dataset estim&i has a variable called "parameter" and each observation corresponds to the name of the one of the variable (either R S T U V W X Y Z) that was selected through backward selection in the previous set. I'd like applied these selected variables to find the corresponding Harrell concordance c-statistic
 
 dataset estim&i
Inline image 1
 
        proc phreg data = boot&i;
    model survival * censored (1) =  values that the variable "parameter" takes in estim.&i  , here (R U W X Z);
                ods output concordance = cstatboot&i ;
run;
 
4)  find out the Harrell concordance c-statistic in the original imputated dataset using these variables
 
       proc phreg data = imput&j;
    model survival * censored (1) =  values that the variable "parameter" takes in estim.&i  , here (R U W X Z);
                ods output concordance = cstatimput&i ;
run;
 
5) the datasets cstatboot&i  and cstatimput&i  have a variable "estimate" and "stderr". I'd like to be able to obtain the difference between those values:
               estimate from cstatboot&i  -   estimate from cstatimput&i  
               stderr from cstatboot&i  -   stderr from cstatimput&i  
 
6) obtain the average of these differences (avg difference estimate +/- average difference stderr) for the 200 resamplings coming from each imputated dataset
 
Thank you for your help,
Much appreciated
4 REPLIES 4
Reeza
Super User

Don't be Loopy - here's a full write up on doing simulations in SAS

http://www2.sas.com/proceedings/forum2007/183-2007.pdf

 

 

It's not clear what your question for us here is...can you be more specific on what exactly you need assistance with?

 


@mili wrote:

 

Hello, 

I have 5 imputated datasets (imput j = 1 to 5), for each one, I'd like to do this 200 times (i = 1 to 200)

1) resample with replacement the same number of observations as in the original imputated dataset 
 
2) in this bootstrap dataset, run a cox proportional regression with backward selection
      proc phreg data = boot&i;
    model survival * censored (1) = R S T U V W X Y Z  / selection = backward;
ods output parameterestimates = estim&i ;
run;
 
3) the dataset estim&i has a variable called "parameter" and each observation corresponds to the name of the one of the variable (either R S T U V W X Y Z) that was selected through backward selection in the previous set. I'd like applied these selected variables to find the corresponding Harrell concordance c-statistic
 
 dataset estim&i
Inline image 1
 
        proc phreg data = boot&i;
    model survival * censored (1) =  values that the variable "parameter" takes in estim.&i  , here (R U W X Z);
                ods output concordance = cstatboot&i ;
run;
 
4)  find out the Harrell concordance c-statistic in the original imputated dataset using these variables
 
       proc phreg data = imput&j;
    model survival * censored (1) =  values that the variable "parameter" takes in estim.&i  , here (R U W X Z);
                ods output concordance = cstatimput&i ;
run;
 
5) the datasets cstatboot&i  and cstatimput&i  have a variable "estimate" and "stderr". I'd like to be able to obtain the difference between those values:
               estimate from cstatboot&i  -   estimate from cstatimput&i  
               stderr from cstatboot&i  -   stderr from cstatimput&i  
 
6) obtain the average of these differences (avg difference estimate +/- average difference stderr) for the 200 resamplings coming from each imputated dataset
 
Thank you for your help,
Much appreciated

 

mili
Fluorite | Level 6
Thanks for you reply.

I just realized I've omitted the following in my code in my initial post:

3) proc phreg data = *boot&i **concordance=harrell (se);*
model survival * censored (1) = * values that the variable "parameter"
takes in estim.&i , here (R U W X Z)*;
ods output concordance =* cstatboot&i *;
run;

4) find out the Harrell concordance c-statistic in the original imputated
dataset using these variables

proc phreg data = *imput&j **concordance=harrell (se);*
model survival * censored (1) = * values that the variable "parameter"
takes in estim.&i , here (R U W X Z)*;
ods output concordance =* cstatimput&i *;
run;

I've read the "Don't be Loopy" document, but I cannot find what I am
looking for:

1) how to code to use the variable retained by the backward selection into
a proc phreg procedure (step #3 and 4 from my initial post) to obtain the
Harrell concordance statistics.

2) how to code to output the results of these 2 Harrell statistics to be
able to obtain the average difference between them.

I hope this is clearer...
Reeza
Super User

Post your code using the code boxes. I can't tell if the asterisks are part of your code or something the forum added, because they don't make sense to me. 

 

So your questions are the following?

 

1) how to code to use the variable retained by the backward selection into
a proc phreg procedure (step #3 and 4 from my initial post) to obtain the
Harrell concordance statistics.

2) how to code to output the results of these 2 Harrell statistics to be
able to obtain the average difference between them.

 

Can you provide a worked example using one of the SASHELP data sets, maybe HEART or one from the docs so we can run something and help you out? 

 

Otherwise for:

 

1. Use the ODS OUTPUT and ParameterEstimates table to get the variables included in the model .You can feed that to your next process by creating a macro variable list out of the variables.

2. Not sure without seeing output. 

 

I'll move this to the statistical procedures forum where someone else may be able to help as well. 

mili
Fluorite | Level 6

yes, these are my 2 questions! Here is a very rough attempt of coding using the Heart dataset as an example. The "??" correspond to my questions, where I do not know how to code adequately!

 

Thank you so much!!

 

/* from SASHELP data sets: HEART;
	let's pretend: 	status = 1 -> alive
					status = 0 -> dead
					
					survival = AgeAtDeath - AgeAtStart 

I'd like to do the following 200 times (i = 1 to 200)


1) resample with replacement the same number of observations as in the original imputated dataset
2) in this bootstrap dataset, run a cox proportional regression with backward selection
3) Apply the variables that were selected from the backward procedure, i.e. under variable "parameter" in the bootstrap 
	dataset -> "&outdata.predictor&i", and find the corresponding Harrell concordance c-statistic for this model in this bootstrap
	and output the "estimate" and "stderr" from the Harrel c-stat added to dataset "performance", where the variable "estimate"
	corresponds to variable estimboot; the variable "stderr" corresponds to variable stdboot for the corresponding
	bootstrap variable boot (thus &i).
4) Apply the variables that were selected from the backward procedure, i.e. under variable "parameter" in the original 
	dataset -> "&outdata.estim&i", and find the corresponding Harrell concordance c-statistic for this model in the original dataset
	-> &indataset., and also added to dataset "performance", where the variable "estimate"
	corresponds to variable estimimpute; the variable "stderr" corresponds to variable stdimput for the corresponding
	bootstrap variable boot (thus &i).
5) in the "performance" dataset: obtain the difference between "estimate original - estimate bootstrap" and 
	"stderr original - stderr bootstrap" as to be able to obtain the average difference for the estimate and stderr

*/

%macro resample (indataset=, outdata=, reps=, size=);
%do i=1 %to &reps;

	proc surveyselect data = &indataset. out = &outdata.&i. noprint
     	method = urs
     	sampsize = &size outhits ;
  	run;


	proc phreg data = &outdata.&i;
   		model survival * status (1) = sex Systolic Smoking Cholesterol Weight / selection = backward;
		ods output parameterestimates = &outdata.predictor&i;
	run;


	proc phreg data = &outdata.&i concordance=harrell (se);
		model survival * status (1) =  ?? -> values that the variable parameter takes in predistor&i;
		 ?? the parameterestimates:
				estimate is added to the column variable estboot under the observation line corresponding to the boostrap # (&i)
					in the dataset performance
				stderr is added to the column variable stdboot under the observation line corresponding to the boostrap # (&i)
					in the dataset performance
	run; 

	proc phreg data = &indataset.&i concordance=harrell (se);
   		model survival * status (1) =  ?? -> values that the variable parameter takes in predistor&i;
		 ?? the parameterestimates:
				estimate is added to the column variable estimpute under the observation line corresponding to the boostrap # (&i)
					in the dataset performance
				stderr is added to the column variable stdimpute under the observation line corresponding to the boostrap # (&i)
					in the dataset performance
	run;

%end;
%mend;

%resample (indataset=sashelp.heart, outdata=work.bootstrap, reps=200,size= 5209);


data summary; set performance;
		diff_estim = estimpute - estboot;
		diff_stderr = stdimpute - stdboot;
	run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2817 views
  • 0 likes
  • 2 in conversation