HI, all.
I am currently running a DID analysis to examine the effects of policy interventions to depression.
I have a few questions about the control variables.
The current data format (example) is as follows.
Depression, age, income: continuous variables
Sex, work: dummy variables
pid | post | policy | depression | sex | age | income | work |
1 | 0 | 1 | 16 | 1 | 50 | 200 | 1 |
1 | 1 | 1 | 15 | 1 | 50 | 300 | 1 |
2 | 0 | 0 | 20 | 0 | 30 | 500 | 1 |
2 | 1 | 0 | 35 | 0 | 30 | 400 | 0 |
3 | 0 | 0 | 17 | 0 | 20 | 600 | 0 |
3 | 1 | 0 | 50 | 0 | 20 | 900 | 1 |
4 | 0 | 1 | 35 | 1 | 40 | 800 | 0 |
4 | 1 | 1 | 25 | 1 | 40 | 400 | 0 |
1. In the above data format, should I time-variant control variables (income, work) convert a post-time value to a pre-time value (re-format 1) or a pre-time value convert to a post-time value (re-format 2)? Or should I use it in the current format?
RE-FORMAT (1)
pid | post | policy | depression | sex | age | income | Income_re | work | Work_re |
1 | 0 | 1 | 16 | 1 | 50 | 200 | 200 | 1 | 1 |
1 | 1 | 1 | 15 | 1 | 50 | 300 | 200 | 1 | 1 |
2 | 0 | 0 | 20 | 0 | 30 | 500 | 500 | 1 | 1 |
2 | 1 | 0 | 35 | 0 | 30 | 400 | 500 | 0 | 1 |
3 | 0 | 0 | 17 | 0 | 20 | 600 | 600 | 0 | 0 |
3 | 1 | 0 | 50 | 0 | 20 | 900 | 600 | 1 | 0 |
4 | 0 | 1 | 35 | 1 | 40 | 800 | 800 | 0 | 0 |
4 | 1 | 1 | 25 | 1 | 40 | 400 | 800 | 0 | 0 |
RE-FORMAT (2)
pid | post | policy | depression | sex | age | income | Income_re | work | Work_re |
1 | 0 | 1 | 16 | 1 | 50 | 200 | 300 | 1 | 1 |
1 | 1 | 1 | 15 | 1 | 50 | 300 | 300 | 1 | 1 |
2 | 0 | 0 | 20 | 0 | 30 | 500 | 400 | 1 | 0 |
2 | 1 | 0 | 35 | 0 | 30 | 400 | 400 | 0 | 0 |
3 | 0 | 0 | 17 | 0 | 20 | 600 | 900 | 0 | 1 |
3 | 1 | 0 | 50 | 0 | 20 | 900 | 900 | 1 | 1 |
4 | 0 | 1 | 35 | 1 | 40 | 800 | 400 | 0 | 0 |
4 | 1 | 1 | 25 | 1 | 40 | 400 | 400 | 0 | 0 |
2. If what I should use the data in current format is right, how should I make a syntax for the control variables (income, work) in the DID regression? And How should I interpret the results for dummy variables? I wrote currently syntax as bellow.
PROC MIXED DATA = LONG;
CLASS POST(REF="0") POLICY(REF="0") SEX(REF="0") WORK(REF="0");
MODEL DEPRESSION=POST|POLICY SEX AGE INCOME WORK / SOLUTION;
LSMEANS POST|EXPOSED / DIFF;
ESTIMATE 'D-I-D' EXPOSED*POST 1 -1 -1 1;
RANDOM Int/SUBJECT=PID TYPE=UN;
RUN;
3. Also, how would I write the syntax if I want to control for changes in the control variable (income, work) over time?
I apologize in advance for my poor English. Thanks!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.