HI, all.
I am currently running a DID analysis to examine the effects of policy interventions to depression.
I have a few questions about the control variables.
The current data format (example) is as follows.
Depression, age, income: continuous variables
Sex, work: dummy variables
pid | post | policy | depression | sex | age | income | work |
1 | 0 | 1 | 16 | 1 | 50 | 200 | 1 |
1 | 1 | 1 | 15 | 1 | 50 | 300 | 1 |
2 | 0 | 0 | 20 | 0 | 30 | 500 | 1 |
2 | 1 | 0 | 35 | 0 | 30 | 400 | 0 |
3 | 0 | 0 | 17 | 0 | 20 | 600 | 0 |
3 | 1 | 0 | 50 | 0 | 20 | 900 | 1 |
4 | 0 | 1 | 35 | 1 | 40 | 800 | 0 |
4 | 1 | 1 | 25 | 1 | 40 | 400 | 0 |
1. In the above data format, should I time-variant control variables (income, work) convert a post-time value to a pre-time value (re-format 1) or a pre-time value convert to a post-time value (re-format 2)? Or should I use it in the current format?
RE-FORMAT (1)
pid | post | policy | depression | sex | age | income | Income_re | work | Work_re |
1 | 0 | 1 | 16 | 1 | 50 | 200 | 200 | 1 | 1 |
1 | 1 | 1 | 15 | 1 | 50 | 300 | 200 | 1 | 1 |
2 | 0 | 0 | 20 | 0 | 30 | 500 | 500 | 1 | 1 |
2 | 1 | 0 | 35 | 0 | 30 | 400 | 500 | 0 | 1 |
3 | 0 | 0 | 17 | 0 | 20 | 600 | 600 | 0 | 0 |
3 | 1 | 0 | 50 | 0 | 20 | 900 | 600 | 1 | 0 |
4 | 0 | 1 | 35 | 1 | 40 | 800 | 800 | 0 | 0 |
4 | 1 | 1 | 25 | 1 | 40 | 400 | 800 | 0 | 0 |
RE-FORMAT (2)
pid | post | policy | depression | sex | age | income | Income_re | work | Work_re |
1 | 0 | 1 | 16 | 1 | 50 | 200 | 300 | 1 | 1 |
1 | 1 | 1 | 15 | 1 | 50 | 300 | 300 | 1 | 1 |
2 | 0 | 0 | 20 | 0 | 30 | 500 | 400 | 1 | 0 |
2 | 1 | 0 | 35 | 0 | 30 | 400 | 400 | 0 | 0 |
3 | 0 | 0 | 17 | 0 | 20 | 600 | 900 | 0 | 1 |
3 | 1 | 0 | 50 | 0 | 20 | 900 | 900 | 1 | 1 |
4 | 0 | 1 | 35 | 1 | 40 | 800 | 400 | 0 | 0 |
4 | 1 | 1 | 25 | 1 | 40 | 400 | 400 | 0 | 0 |
2. If what I should use the data in current format is right, how should I make a syntax for the control variables (income, work) in the DID regression? And How should I interpret the results? I wrote currently syntax as bellow.
PROC MIXED DATA = LONG;
CLASS POST(REF="0") POLICY(REF="0") SEX(REF="0") WORK(REF="0");
MODEL DEPRESSION=POST|POLICY SEX AGE INCOME WORK / SOLUTION;
LSMEANS POST|EXPOSED / DIFF;
ESTIMATE 'D-I-D' EXPOSED*POST 1 -1 -1 1;
RANDOM Int/SUBJECT=PID TYPE=UN;
RUN;
3. Also, how would I write the syntax if I want to control for changes in the control variable (income, work) over time?
I apologize in advance for my poor English. Thanks!