## Proportionality assumption for nominal variables

Occasional Contributor
Posts: 12

# Proportionality assumption for nominal variables

I am using Proc Phreg to model the proportional hazard model. My covariates are nominal variables (e.g. name of cities, name of assembly plant, etc). My data is censored.

I am trying to use the time dependent analysis to check the proportionality as shown below:

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") PLANT (REF="A1");

MODEL TIME*CENSOR(0) = CITY PLANT CITY_T1 PLANT_T1 / TIES=EFRON;

CITY_T1 = TIME*(CITY="KOLKATA");

PLANT_T1=TIME*(PLANT="A2");

RUN;

The above code runs for a long time and then quits saying that the disk is out of space.

What am I doing wrong?

Super User
Posts: 23,724

## Proportionality assumption for nominal variables

Can you try creating the variable outside the model and then running it inside and seeing if that works?

I'm not sure but you can also verify that you need to do it at each level (ie at the city level of kolkata) that you're doing, rather than at the city variable.  I don't remember how to deal with this at the categorical/nominal level.

You can also try using the assess ph/resample; statement which tests the PH assumptions and provides p-values for it as well.

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") PLANT (REF="A1");

MODEL TIME*CENSOR(0) = CITY PLANT CITY_T1 PLANT_T1 / TIES=EFRON;

assess ph/resample;

RUN;

Occasional Contributor
Posts: 12

## Proportionality assumption for nominal variables

I will try creating the variable outside. I would have a new variable called CITY_T1 that has the value TIME for records where the CITY is KOLKATA and 0 wherever the records has CITY not equal to KOLKATA. I think that is how the CITY_T1 should be coded.

The assess ph option resulted in the same issue (disk out of space and no results).

Super User
Posts: 23,724

## Proportionality assumption for nominal variables

How many levels does your variable have for City and Plant?

And how many observations?

Occasional Contributor
Posts: 12

## Proportionality assumption for nominal variables

I have coded the actual problem and hence the variables are not essentially city and plant. My problem would have about 5 levels for City and about 6 for plant. There would be about 40000 records.

Super User
Posts: 23,724

## Proportionality assumption for nominal variables

Assuming the model runs fine with out the assess or the time dependent variables?

Occasional Contributor
Posts: 12

## Proportionality assumption for nominal variables

Yes. this code works just fine.

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") PLANT (REF="A1");

MODEL TIME*CENSOR(0) = CITY PLANT / TIES=EFRON;

RUN;

Super User
Posts: 23,724

## Proportionality assumption for nominal variables

There's a note in the user guide:

Here, log(t) is used instead of t to avoid numerical instability in the computation. The constant,

5.4, is the average of the logs of the survival times and is included to improve interpretability.

Try doing log(time) rather than time?

Occasional Contributor
Posts: 12

## Proportionality assumption for nominal variables

Interesting. Wonder how I missed that. Let me try that and update you tomorrow. Do not have access to SAS to test it immediately.

Super User
Posts: 23,724

## Proportionality assumption for nominal variables

The constant mentioned is based on the example they were doing, so you could use something relevant to your data

Occasional Contributor
Posts: 12

## Proportionality assumption for nominal variables

I tried this and got the same error:

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") PLANT (REF="A1");

MODEL TIME*CENSOR(0) = CITY PLANT CITY_T1 PLANT_T1 / TIES=EFRON;

CITY_T1 = LOG(TIME)*(CITY="KOLKATA");

PLANT_T1=LOG(TIME)*(PLANT="A2");

RUN;

DATA INPUT1;

SET INPUT;

IF PLANT = "A2" THEN PLANT_T1 = TIME;

ELSE PLANT_T1 = 0;

IF CITY = "KOLKATA" THEN CITY_T1 = TIME;

ELSE CITY_T1 = 0;

RUN;

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") PLANT (REF="A1");

MODEL TIME*CENSOR(0) = CITY PLANT CITY_T1 PLANT_T1 / TIES=EFRON;

RUN;

This worked. Just that now the both PLANT (KOLKATA) and PLANT_T1 comes out significant (Pr > ChiSq --> <0.0001). What does this mean?

Occasional Contributor
Posts: 12

## Proportionality assumption for nominal variables

I tried the assess option as you had suggested:

ODS GRAPHICS ON;

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") ;

MODEL TIME*CENSOR(0) = CITY / TIES=EFRON;

ASSESS PH / RESAMPLE=2;

RUN;

ODS GRAPHICS OFF;

I reduced the covariate to only CITY. And also in the RESAMPLE option I gave 2 (not sure if it would give any meaningful result for such a small number). I did get the results. Also if I completely remove the ODS options, although I do not get the charts, I save a minute of computation time. In both the cases I do get the Supremum Test of Proportionality results. It still takes about 10 minutes. As I said I have anywhere between 40000 records to half a million records at times. If this is the cause of slowdown, then I will not be able to do much about the number of records. Also does it make any sense to do the above analysis one covariate at a time (would that not mislead the final interpretation)?

Super User
Posts: 23,724

## Proportionality assumption for nominal variables

I'd be in favour of using the ASSESS statement rather than the time dependent variable because it tests all the different levels, otherwise you need to test each one at a time.

Are the p-values for that signficant as well? It means you have a time dependent variable, or your proportional hazard assumption is violated. So whatever city variable actually is, it is changing over time.

And no, running this with one variable at a time doesn't make any sense.

You should talk to SAS tech support and see if there's anyway to optimize this and see what hardware you'd need to run this properly.

Occasional Contributor
Posts: 12

## Proportionality assumption for nominal variables

Thanks Reeza. I will check with SAS tech support.

yes when I run the following

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") ;

MODEL TIME*CENSOR(0) = CITY / TIES=EFRON;

ASSESS PH / RESAMPLE=2;

RUN;

I do find the Supremum Test of Proportionality results shows significance for all CITY levels.

For the time dependent variable - if i were to create time dependent variable for all levels:

DATA INPUT1;

SET INPUT;

IF PLANT = "A2" THEN PLANT_T1 = TIME;

ELSE PLANT_T1 = 0;

IF PLANT = "A3" THEN PLANT_T2 = TIME;

ELSE PLANT_T2 = 0;

IF CITY = "KOLKATA" THEN CITY_T1 = TIME;

ELSE CITY_T1 = 0;

IF CITY = "CHENNAI" THEN CITY_T2 = TIME;

ELSE CITY_T2= 0;

RUN;

PROC PHREG DATA = INPUT;

CLASS CITY (REF="BOMBAY") PLANT (REF="A1");

MODEL TIME*CENSOR(0) = CITY PLANT CITY_T1 PLANT_T1 CITY_T2 PLANT_T2/ TIES=EFRON;

RUN;

In the above example i only assumed 3 levels for each covariate. Would this not account for all levels? Yes it does include preprocessing at my end and also the number of variables in the model statement will be doubled.

Discussion stats
• 13 replies
• 321 views
• 3 likes
• 2 in conversation