BookmarkSubscribeRSS Feed
zedstar
Calcite | Level 5

Hi all. Sorry about the length of the post but I wanted to be comprehensive in terms of context.

 

I have data of exits from an employment program over the course of a year. In survival analysis terms, the 'failure' variable is leaving the program (which is a positive thing as it means the individual has found employment). There are two groups I wish to compare in terms of 'failures'. Some people are in Group A, which is a 'standard' program over the course of a year. Some people are in Group B, which is the 'standard' program for the first six months, and then at the six month mark Group B is subject to additional participation requirements.

 

It is been found before that when people are subject to additional participation requirements (Group B), they leave the program more quickly around the time the additional requirements are imposed, compared to people without the additional requirements (Group A). Individuals know when the additional requirements are coming up, so they might leave shortly before the additional requirements are imposed or shortly after.

 

Therefore, when comparing the groups, I would expect similar numbers of exits for the first 5 months or so, but then I would expect increased exits for a few weeks for Group B. In other words, I expect an interaction of time (measured in fortnights) and Group and that the hazard ratio will not be constant over time.

 

The two groups are also not random but quasi-experimental, so there are additional covariates (such as participant gender, age, local unemployment rate, etc) that are included. I'm not directly interested in these variables; I just want to hold them constant between the groups.

 

The particular problems I'm having are the following:

  1. Unlike PROC LIFETEST, no hazard rate is produced by PROC PHREG and SAS won't do it. I'd like to know the percentage that leave each week, with the population at the beginning of the week as the base. I can calculate this 'manually' but the fact that SAS doesn't produce it seems like perhaps it's an invalid/incoherent thing to ask for??
  2. when I look at the survival graph and the other output, the expected 'spike' in exits (of Group B relative to Group A) does not seem to be reflected (whereas in the raw data, the spike in exits for Group B around the six month mark is obvious). I thought that my specification of the interaction variable would be sufficient to allow the hazard ratio to vary at each fortnight - but the output seems to 'smooth' the hazard ratio over the graph - and both groups now appear to have a 'spike' instead of Group B specifically.
  3. I can get SAS to produce hazard ratios at each fortnight of the interaction but again the hazard rate I calculate from each group does not seem to relate to the survival function as produced by SAS PHREG
  4. I want to compare the survival/failure lines of  the two groups so I am using DIRADJ to adjust the covariates to equalize the groups. I do not want to use reference groups, as there are many class variables (more than shown here) and none of them are particularly 'representative' of the caseload. Is DIRADJ doing what I want by equalizing the GROUPS as if each contained the 'average' population of both groups?

This is the SAS code I am using for the phreg (I have dropped some of the covariates for clarity of reading, and I've perturbed the data from any output). The data are in long format, one row per fortnight per participant until they leave the program if they leave within a year, right-censored at a year (26 fortnights) if they do not leave the program, I have included indicative output.

 

ods graphics on;
proc phreg data=exits plots(overlay=stratum)=(survival);
class GROUP GENDER REMOTENESS fortnight /param=ref ref=first order=internal;
model (start,stop)*exit(0) = GROUP AGE GENDER REMOTENESS LAST_SCORE fortnight * GROUP / ties=efron alpha=0.01 rl;
baseline out=exit_out survival=_all_ /diradj group=GROUP;
hazardratio GROUP / at (fortnight=ALL) alpha=0.01 ;
run;
ods graphics off;

Hazard Ratios for GROUP

DescriptionPoint Estimate
GROUP GROUP_B vs GROUP_A At fortnight=11.090
GROUP GROUP_B vs GROUP_A At fortnight=21.258
GROUP GROUP_B vs GROUP_A At fortnight=31.240
GROUP GROUP_B vs GROUP_A At fortnight=41.155
GROUP GROUP_B vs GROUP_A At fortnight=51.075
GROUP GROUP_B vs GROUP_A At fortnight=61.117
GROUP GROUP_B vs GROUP_A At fortnight=70.942
GROUP GROUP_B vs GROUP_A At fortnight=80.941
GROUP GROUP_B vs GROUP_A At fortnight=91.006
GROUP GROUP_B vs GROUP_A At fortnight=100.962
GROUP GROUP_B vs GROUP_A At fortnight=110.977
GROUP GROUP_B vs GROUP_A At fortnight=121.173
GROUP GROUP_B vs GROUP_A At fortnight=131.310
GROUP GROUP_B vs GROUP_A At fortnight=141.347
GROUP GROUP_B vs GROUP_A At fortnight=151.322
GROUP GROUP_B vs GROUP_A At fortnight=161.381
GROUP GROUP_B vs GROUP_A At fortnight=171.389
GROUP GROUP_B vs GROUP_A At fortnight=181.144
GROUP GROUP_B vs GROUP_A At fortnight=191.206
GROUP GROUP_B vs GROUP_A At fortnight=201.072
GROUP GROUP_B vs GROUP_A At fortnight=211.196
GROUP GROUP_B vs GROUP_A At fortnight=221.117
GROUP GROUP_B vs GROUP_A At fortnight=230.906
GROUP GROUP_B vs GROUP_A At fortnight=241.083
GROUP GROUP_B vs GROUP_A At fortnight=250.806
GROUP GROUP_B vs GROUP_A At fortnight=260.761
GROUP GROUP_B vs GROUP_A At fortnight=270.729

 

The graphs below are not the original data but they illustrate the kind of thing happening to the original.

 

Below on the right is a graph of 'raw' exits showing the percentage leaving each fortnight, with the denominator being the number of people left at the beginning of the fortnight. The two lines are Group A and B.

 

Below on the left is a graph, plotted by taking the fortnightly 'survival' rate from phreg output, turning it into a failure rate (1-survival), and then calculating a 'hazard' rate each fortnight for each group. You can see this hazard ratio between the groups is essentially constant. Is my interaction variable wrongly specified? I want the hazard ratio between the groups to be free to vary at each fortnight.

 capture3.PNGcapture4.PNG

 

4 REPLIES 4
JacobSimonsen
Barite | Level 11

How did you create this "fortnight" variable? It will not be mathematical correct to use start and stop to create the independent variables. That is because you then introduce some future dependency on specifying the rates. Instead, if you want interaction with time you should create the time variable inside proc phreg - something like this:

 

proc phreg data=simulation;

period1_a=(t<=5)*a;

period2_a=(t>5)*a;

model t=period1_a period2_a/rl;

run;

 

There is also an other way to make interaction with time in phreg. Then you need first to aggregate your data on the risksets. Doing this it becomes "legal" to use the time variable to make the independent variables. I have made a macro for the aggregation step (coxaggregate). Maybe you will find it useful. Here is a simple example of how it can be used for making a interaction with time. (time is here a effect modifier on the covariate "a", such the true effect of a is 1.5 before time=5 and 2 later on.) Notice that the two phregs give same result, but only by aggregating on riskset you can use the hazardratio statement. 

 

data simulation;
  do i=1 to 1000;
    a=mod(i,2);
	rate1=0.1*exp(log(1.5)*a);
	rate2=0.1*exp(log(2)*a);
    t=rand('exponential',1/rate1);
	if t>5 then t=5+rand('exponential',1/rate2);
	event=1;
    output;
  end;
  keep t a event;
run;
quit;

proc phreg data=simulation;
  period1_a=(t<=5)*a;
  period2_a=(t>5)*a;
  model t=period1_a period2_a/rl;
run;


%coxaggregate(data=simulation,output=coxout,entry=0,exit=t,event=event,covariate=a)

data coxout;
  set coxout;
  timegroup=(time<=5);
run;
proc phreg data=coxout nosummary;
  class a(ref="0") timegroup/param=glm ;
  model dummytime*dummytime(2)= timegroup*a;
  hazardratio a/at(timegroup=all);
  strata time ;
  freq weight;
run;
zedstar
Calcite | Level 5

Hi  , thank you for response. I have some clarifications and questions below.

How did you create this "fortnight" variable? It will not be mathematical correct to use start and stop to create the independent variables. That is because you then introduce some future dependency on specifying the rates. Instead, if you want interaction with time you should create the time variable inside proc phreg - something like this:

procphregdata=simulation;

period1_a=(t<=5)*a;

period2_a=(t>5)*a;

model t=period1_a period2_a/rl;

run;

I did use 'stop' to create the fortnight variable, but I don't know what you mean by 'introduce some future dependency'. The particular fortnight that is being considered does not depend on some event in the future, but by how many fortnights have already passed since beginning observation (??) I am using the counting process syntax because I have other time-dependent variables (but I don't expect these to interact with time - only the group variable - which does not change - interacts with time.)

 

In the above code, 't' seems to be the survival variable (in time units), and 'a' is the grouping variable that interacts with time. But I don't want to break up time into two parts, but into 26 parts.

 

Here is a simple example of how it can be used for making a interaction with time. (time is here a effect modifier on the covariate "a", such the true effect of a is 1.5 before time=5 and 2 later on.) Notice that the two phregs give same result, but only by aggregating on riskset you can use the hazardratio statement. 

This implies I need to estimate what I believe the hazard ratio to be during each time period?

JacobSimonsen
Barite | Level 11

It is not exactly clear to me what "start" and "stop" is. If you have divided your survival time into subintervals which "start" and "stop" is the endpoints for the subintervals, then I think you did it right (and you can forget my comment).

But if "stop" is the survival time and you used that for create "fortnight", then it does create predictors is introducing dependency on future events: In the cox regression you go through the timeline, and at each timepoint you can legally construct the rate by using only what happens in the past. Using the survival time for creating the rate before the event time will then be to conditioning on the future.

Alternatively, the statement that creates the rate can be specified inside proc phreg. The difference is then that it is not directly the survival time from each observation that is used, but instead a running time value (depending on where on the timeline the rates is to be calculated) that is used.

zedstar
Calcite | Level 5

It is not exactly clear to me what "start" and "stop" is. If you have divided your survival time into subintervals which "start" and "stop" is the endpoints for the subintervals, then I think you did it right (and you can forget my comment).

But if "stop" is the survival time and you used that for create "fortnight", then it does create predictors is introducing dependency on future events: In the cox regression you go through the timeline, and at each timepoint you can legally construct the rate by using only what happens in the past. Using the survival time for creating the rate before the event time will then be to conditioning on the future.

Thanks again for the response  

 

"Start" and "stop" mark the end of subintervals of time, conditioned only on the date (every two weeks, a new interval is created on a new row). However, intervals are created until either

  • The interval (covering two weeks) contains the survival event (in which case, for this interval, it will show the survival event to have happened; or
  • 26 fortnights are counted and the event has not occurred, and the last row is right-censored.

So if somebody left in three fortnights, they would have three rows, but if somebody never left, they'd have 26 rows. If somebody left in the third fortnight, their survival time would be equal to the 'stop' variable on that row only. The previous rows would show that the survival event did not occur in that time period.

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 4709 views
  • 0 likes
  • 2 in conversation