I'm conducting a difference-in-difference analysis to assess the impact of public housing demolition on violent crime. I'm using a multilevel, negative binomial model coded with proc glimmix. My intervention group is census tracts where public housing was completely demolished, and my control group is census tracts where public housing underwent routine maintenance. There are two key pieces of data that could affect my analyses: 1) the public housing developments that were demolished are as large as 1,700 units, and as small as 400 units.; 2) since I am using census data from the 2000 and 2010 census, my census tract population data abruptly shift, sometimes dropping by as many as 1,400 people between censuses.
Question 1: Since the drop in the number of housing units would correlate with the population decline in a given census tract over time, I believe what I need to do is somehow weight the data analysis to account for there being larger public housing developments in some tracts and smaller developments in others prior to demolition. How do I go about this? Is it as simple as adding WEIGHT=number_units_at_start to the RANDOM statement? Most of the articles I've found are on survey weighting, and aren't helpful. Others are indecipherable, since I'm proficient but not fluent in biostatistics.
Question 2: I've also entertained the idea of adjusting for the number of public housing units at the start of my timeframe. The control tracts lost housing units as well, but to routine maintenance, not wholesale demolition. What would the impact of this be vs. adding some kind of weighting variable?
Here are examples of my code and data (there are 176 rows of data from 8 intervention and 8 control tracts in the full table; housing unit data are just examples since that variable isn't in my dataset at the moment). Thank you in advance for any advice.
proc glimmix data=main.toph_es_demo PLOTS=pearsonpanel(marginal);
class tract couplet exposed (ref="0") timeline (ref="-1") c00;
model totcrime = timeline*pre timeline*post exposed timeline*pre*exposed timeline*post*exposed c00/
solution
dist = negbin
link = log
offset = logpopyrs
cl;
random int/subject=couplet(tract) type=un cl s;
covtest 'var(couplet(tract))=0' 0 .;
output out=residmodel1_negbin_TOPH_des pred(noblup)=predicted
pearson(noblup)=pearson;
run;
Tract | Year | Timeline | Tract_pop | Crime_count | pre | post | exposed | log_population_yrs | violent_crime_rate | number_units_at_start | couplet |
I | 1995 | -5 | 1205 | 81 | 1 | 0 | 1 | 7.09 | 67.22 | 1700 | 1 |
I | 1996 | -4 | 1205 | 59 | 1 | 0 | 1 | 7.09 | 48.96 | 1700 | 1 |
I | 1997 | -3 | 1205 | 26 | 1 | 0 | 1 | 7.09 | 21.58 | 1700 | 1 |
I | 1998 | -2 | 1205 | 30 | 1 | 0 | 1 | 7.09 | 24.90 | 1700 | 1 |
I | 1999 | -1 | 1205 | 24 | 0 | 0 | 1 | 7.09 | 19.92 | 1700 | 1 |
I | 2000 | 0 | 1205 | 26 | 0 | 1 | 1 | 7.09 | 21.58 | 1700 | 1 |
I | 2001 | 1 | 1205 | 18 | 0 | 1 | 1 | 7.09 | 14.94 | 1700 | 1 |
I | 2002 | 2 | 1205 | 10 | 0 | 1 | 1 | 7.09 | 8.30 | 1700 | 1 |
I | 2003 | 3 | 1205 | 13 | 0 | 1 | 1 | 7.09 | 10.79 | 1700 | 1 |
I | 2004 | 4 | 1205 | 11 | 0 | 1 | 1 | 7.09 | 9.13 | 1700 | 1 |
I | 2005 | 5 | 1991 | 17 | 0 | 1 | 1 | 7.60 | 8.54 | 1700 | 1 |
C | 1995 | -5 | 290 | 9 | 1 | 0 | 0 | 5.67 | 31.03 | 850 | 1 |
C | 1996 | -4 | 290 | 17 | 1 | 0 | 0 | 5.67 | 58.62 | 850 | 1 |
C | 1997 | -3 | 290 | 7 | 1 | 0 | 0 | 5.67 | 24.14 | 850 | 1 |
C | 1998 | -2 | 290 | 8 | 1 | 0 | 0 | 5.67 | 27.59 | 850 | 1 |
C | 1999 | -1 | 290 | 7 | 0 | 0 | 0 | 5.67 | 24.14 | 850 | 1 |
C | 2000 | 0 | 290 | 7 | 0 | 1 | 0 | 5.67 | 24.14 | 850 | 1 |
C | 2001 | 1 | 290 | 6 | 0 | 1 | 0 | 5.67 | 20.69 | 850 | 1 |
C | 2002 | 2 | 290 | 4 | 0 | 1 | 0 | 5.67 | 13.79 | 850 | 1 |
C | 2003 | 3 | 290 | 13 | 0 | 1 | 0 | 5.67 | 44.83 | 850 | 1 |
C | 2004 | 4 | 290 | 6 | 0 | 1 | 0 | 5.67 | 20.69 | 850 | 1 |
C | 2005 | 5 | 244 | 8 | 0 | 1 | 0 | 5.50 | 32.79 | 850 | 1 |