Hi,
I am running a cox model with time-varying exposure. Exact age is the time scale and the exposure varies by calendar year. I have this code that creates the following output:
proc phreg data = dat ;
model age* outcome(0) = var_pm25 edu sex center/ rl entry=age0;
array pm25 {15} pm25_1999 - pm25_2013 ;
do i = 1 to 15;
if (age1999+i-1)<age<=(age1999+i) then var_pm25= pm25[i];
end;
run;
How can I recreate the above results, without the programming statement? I tried to include the array in a data step before the proc, and the array pulls the exposure I am interested in. but the total sample, and number of events with missing, and hazard ratio are (very) different. I think it has something to do with when the exposure is pulled. code/results below.
data test;
set dat;
array pm25 {15} pm25_1999 - pm25_2013 ;
array pm25_new {15} _1999 - _2013 ;
do i = 1 to 15;
if (age1999+i-1)<age<=(age1999+i) then pm25_new[i]= pm25[i]/5;
end;
run;
data test2;
set test;
pm25_new=sum (of _1999-_2013);
run;
proc phreg data = test2 ;
model age* outcome(0) = pm25_new edu sexe cod_cen / rl entry=age0;
run;
Thank you.
Hi @pamplemouse22,
I think your current approach (using dataset test2) cannot work because variable pm25_new is (necessarily) constant, while it ought to be time-dependent (like var_pm25). It should be possible, however, to omit the programming statements in the PROC PHREG step if you create an input dataset with a different structure (and use the slightly different syntax for such datasets in the MODEL statement): see Counting Process Style of Input. There you have multiple observations per subject, corresponding to disjoint time intervals, which enables you to define different values of a time-dependent covariate in different time intervals.
If the approach using var_pm25 works, why do you want to eliminate the programming statements from the PROC PHREG step?
Just to see if I understand how it works, properly. I guess I don't since I am not getting numbers to match! I also have to make some small changes, and need to understand this well so I feel confident in my changes.
If in doubt, I would probably do a simulation, i.e., create artificial data based on a (Cox-PH) model with known coefficients. I think a simplified model with the time-dependent covariate as the only predictor might be sufficient for this purpose. Chapter 12 of Simulating Data with SAS® should contain instructions for that, but I don't have this book yet.
Why did you change the logic to divide by 5? And then SUM?
What happens if you just use the same logic?
data test;
set dat;
array pm25 {15} pm25_1999 - pm25_2013 ;
do i = 1 to 15;
if (age1999+i-1)<age<=(age1999+i) then var_pm25 = pm25[i];
end;
run;
proc phreg data = test;
model age* outcome(0) = var_pm25 edu sex center/ rl entry=age0;
run;
I don't think you understand what the DO loop was doing.
array pm25 {15} pm25_1999 - pm25_2013 ;
do i = 1 to 15;
if (age1999+i-1)<age<=(age1999+i) then var_pm25 = pm25[i];
end;
If was NOT generating 15 values of VAR_PM25. It was just finding which value to use based on the value of AGE compared to the value of AGE1999. So if the subject was 50 years old in 1999 and in this records they are now 51 it will use the value from the PM_2000 variable.
Why would the logic of a do loop be different? The only difference I know is that in PHREG the ARRAY statement requires the
{15}
after the array name. In a data step SAS is smart enough the know that pm25_1999 - pm25_2013 is 15 variables without adding that extra bit of code to the ARRAY statement.
Did you try the code i posted? Did you get the matching results?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.