Hi,
I want to asses how some specific regulatory intervention (e.g. tax relief) affects time to firms' defaults using survival analysis. I have around 130 thousand firms, around 6% default cases.
Each firm can receive treatment (regulatory intervention) in a different moment of time and more than once, so I use time‐dependent covariates approach as in
this paper and reshape my original dataset to the "counting process style".
In my model I want to control also for the industry in which each firm operates as the treatment is somehow linked to this.
So, following the paper linked above, I use the following code:
PROC PHREG DATA = my_data;
CLASS industry_code;
MODEL (tstart, tstop)*endpt(0) = treatment_variable industry_code/ TIES = EFRON RL;
RUN;
where:
- tstart and tstop are the boundaries of time periods created in accordance with "counting process style"
- endpt(0) - shows if each period ends in default or not
- industry_code - a 4-digit NACE code of the industry
Now I have two questions:
1. Is it correct to use
NACE industry code both in the class and model statement? It's just a set of digits with no interpretation - should I transform it and create a set of dummies instead?
2. As regards the treatment_variable: one way of specyfing this variable is the value of treatment in a given tstart-tstop period. What about specyfing it as a dummy showing whether the firm was at all subject to treatment or not? This would not be time-variant but time-fixed. Can I include it in the model with counting process setup? This dummy would be created as follows:
firm 1: was subject to regulatory intervention twice, in 2010Q4 and 2011Q1, and did not default -> treatment dummy = 1
firm 2: was not subject to regulatory intervention and did not default -> treatment dummy = 0
firm 3: was subject to regulatory intervention once in 2010Q3 and went into default in 2011Q4 -> treatment dummy = 1
firm 4: was not subject to regulatory intervention and went into default in 2012Q1 -> treatment dummy = 0