BookmarkSubscribeRSS Feed
rajvaidya
Calcite | Level 5

I am using Proc Phreg to model the proportional hazard model. My covariates are nominal variables (e.g. name of cities, name of assembly plant, etc). My data is censored.

I am trying to use the time dependent analysis to check the proportionality as shown below:

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") PLANT (REF="A1");

MODEL TIME*CENSOR(0) = CITY PLANT CITY_T1 PLANT_T1 / TIES=EFRON;

CITY_T1 = TIME*(CITY="KOLKATA");

PLANT_T1=TIME*(PLANT="A2");

RUN;

The above code runs for a long time and then quits saying that the disk is out of space.

What am I doing wrong?

13 REPLIES 13
Reeza
Super User

Can you try creating the variable outside the model and then running it inside and seeing if that works?

I'm not sure but you can also verify that you need to do it at each level (ie at the city level of kolkata) that you're doing, rather than at the city variable.  I don't remember how to deal with this at the categorical/nominal level.

You can also try using the assess ph/resample; statement which tests the PH assumptions and provides p-values for it as well.

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") PLANT (REF="A1");

MODEL TIME*CENSOR(0) = CITY PLANT CITY_T1 PLANT_T1 / TIES=EFRON;

assess ph/resample;

RUN;

rajvaidya
Calcite | Level 5

I will try creating the variable outside. I would have a new variable called CITY_T1 that has the value TIME for records where the CITY is KOLKATA and 0 wherever the records has CITY not equal to KOLKATA. I think that is how the CITY_T1 should be coded.

The assess ph option resulted in the same issue (disk out of space and no results).

Reeza
Super User

How many levels does your variable have for City and Plant?

And how many observations?

rajvaidya
Calcite | Level 5

I have coded the actual problem and hence the variables are not essentially city and plant. My problem would have about 5 levels for City and about 6 for plant. There would be about 40000 records.

Reeza
Super User

Assuming the model runs fine with out the assess or the time dependent variables?

rajvaidya
Calcite | Level 5

Yes. this code works just fine.

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") PLANT (REF="A1");

MODEL TIME*CENSOR(0) = CITY PLANT / TIES=EFRON;

RUN;

Reeza
Super User

There's a note in the user guide:

Here, log(t) is used instead of t to avoid numerical instability in the computation. The constant,

5.4, is the average of the logs of the survival times and is included to improve interpretability.

Try doing log(time) rather than time?

rajvaidya
Calcite | Level 5

Interesting. Wonder how I missed that. Let me try that and update you tomorrow. Do not have access to SAS to test it immediately.

Reeza
Super User

The constant mentioned is based on the example they were doing, so you could use something relevant to your data

rajvaidya
Calcite | Level 5

I tried this and got the same error:

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") PLANT (REF="A1");

MODEL TIME*CENSOR(0) = CITY PLANT CITY_T1 PLANT_T1 / TIES=EFRON;

CITY_T1 = LOG(TIME)*(CITY="KOLKATA");

PLANT_T1=LOG(TIME)*(PLANT="A2");

RUN;

Instead I followed your suggestion and created CITY_T1 and PLANT_T1 outside:

DATA INPUT1;

SET INPUT;

IF PLANT = "A2" THEN PLANT_T1 = TIME;

ELSE PLANT_T1 = 0;

IF CITY = "KOLKATA" THEN CITY_T1 = TIME;

ELSE CITY_T1 = 0;

RUN;

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") PLANT (REF="A1");

MODEL TIME*CENSOR(0) = CITY PLANT CITY_T1 PLANT_T1 / TIES=EFRON;

RUN;

This worked. Just that now the both PLANT (KOLKATA) and PLANT_T1 comes out significant (Pr > ChiSq --> <0.0001). What does this mean?

rajvaidya
Calcite | Level 5

I tried the assess option as you had suggested:

ODS GRAPHICS ON;

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") ;

MODEL TIME*CENSOR(0) = CITY / TIES=EFRON;

ASSESS PH / RESAMPLE=2;

RUN;

ODS GRAPHICS OFF;

I reduced the covariate to only CITY. And also in the RESAMPLE option I gave 2 (not sure if it would give any meaningful result for such a small number). I did get the results. Also if I completely remove the ODS options, although I do not get the charts, I save a minute of computation time. In both the cases I do get the Supremum Test of Proportionality results. It still takes about 10 minutes. As I said I have anywhere between 40000 records to half a million records at times. If this is the cause of slowdown, then I will not be able to do much about the number of records. Also does it make any sense to do the above analysis one covariate at a time (would that not mislead the final interpretation)?

Reeza
Super User

I'd be in favour of using the ASSESS statement rather than the time dependent variable because it tests all the different levels, otherwise you need to test each one at a time.

Are the p-values for that signficant as well? It means you have a time dependent variable, or your proportional hazard assumption is violated. So whatever city variable actually is, it is changing over time.

And no, running this with one variable at a time doesn't make any sense.

You should talk to SAS tech support and see if there's anyway to optimize this and see what hardware you'd need to run this properly.

rajvaidya
Calcite | Level 5

Thanks Reeza. I will check with SAS tech support.

yes when I run the following

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") ;

MODEL TIME*CENSOR(0) = CITY / TIES=EFRON;

ASSESS PH / RESAMPLE=2;

RUN;

I do find the Supremum Test of Proportionality results shows significance for all CITY levels.

For the time dependent variable - if i were to create time dependent variable for all levels:

DATA INPUT1;

SET INPUT;

IF PLANT = "A2" THEN PLANT_T1 = TIME;

ELSE PLANT_T1 = 0;

IF PLANT = "A3" THEN PLANT_T2 = TIME;

ELSE PLANT_T2 = 0;

IF CITY = "KOLKATA" THEN CITY_T1 = TIME;

ELSE CITY_T1 = 0;

IF CITY = "CHENNAI" THEN CITY_T2 = TIME;

ELSE CITY_T2= 0;

RUN;

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") PLANT (REF="A1");

MODEL TIME*CENSOR(0) = CITY PLANT CITY_T1 PLANT_T1 CITY_T2 PLANT_T2/ TIES=EFRON;

RUN;

In the above example i only assumed 3 levels for each covariate. Would this not account for all levels? Yes it does include preprocessing at my end and also the number of variables in the model statement will be doubled.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1376 views
  • 3 likes
  • 2 in conversation