Weighting (proc glimmix), using weights correctly for DID analysis

DID · Posted 12-17-2022 12:52 PM

I'm conducting a difference-in-difference analysis to assess the impact of public housing demolition on violent crime. I'm using a multilevel, negative binomial model coded with proc glimmix. My intervention group is census tracts where public housing was completely demolished, and my control group is census tracts where public housing underwent routine maintenance. There are two key pieces of data that could affect my analyses: 1) the public housing developments that were demolished are as large as 1,700 units, and as small as 400 units.; 2) since I am using census data from the 2000 and 2010 census, my census tract population data abruptly shift, sometimes dropping by as many as 1,400 people between censuses.

Question 1: Since the drop in the number of housing units would correlate with the population decline in a given census tract over time, I believe what I need to do is somehow weight the data analysis to account for there being larger public housing developments in some tracts and smaller developments in others prior to demolition. How do I go about this? Is it as simple as adding WEIGHT=number_units_at_start to the RANDOM statement? Most of the articles I've found are on survey weighting, and aren't helpful. Others are indecipherable, since I'm proficient but not fluent in biostatistics.

Question 2: I've also entertained the idea of adjusting for the number of public housing units at the start of my timeframe. The control tracts lost housing units as well, but to routine maintenance, not wholesale demolition. What would the impact of this be vs. adding some kind of weighting variable?

Here are examples of my code and data (there are 176 rows of data from 8 intervention and 8 control tracts in the full table; housing unit data are just examples since that variable isn't in my dataset at the moment). Thank you in advance for any advice.

proc glimmix data=main.toph_es_demo PLOTS=pearsonpanel(marginal);
class tract couplet exposed (ref="0") timeline (ref="-1") c00;
model totcrime = timeline*pre timeline*post exposed timeline*pre*exposed timeline*post*exposed c00/
solution

dist = negbin
link = log
offset = logpopyrs
cl;
random int/subject=couplet(tract) type=un cl s;
covtest 'var(couplet(tract))=0' 0 .;
output out=residmodel1_negbin_TOPH_des pred(noblup)=predicted
pearson(noblup)=pearson;
run;

Tract	Year	Timeline	Tract_pop	Crime_count	pre	post	exposed	log_population_yrs	violent_crime_rate	number_units_at_start	couplet
I	1995	-5	1205	81	1	0	1	7.09	67.22	1700	1
I	1996	-4	1205	59	1	0	1	7.09	48.96	1700	1
I	1997	-3	1205	26	1	0	1	7.09	21.58	1700	1
I	1998	-2	1205	30	1	0	1	7.09	24.90	1700	1
I	1999	-1	1205	24	0	0	1	7.09	19.92	1700	1
I	2000	0	1205	26	0	1	1	7.09	21.58	1700	1
I	2001	1	1205	18	0	1	1	7.09	14.94	1700	1
I	2002	2	1205	10	0	1	1	7.09	8.30	1700	1
I	2003	3	1205	13	0	1	1	7.09	10.79	1700	1
I	2004	4	1205	11	0	1	1	7.09	9.13	1700	1
I	2005	5	1991	17	0	1	1	7.60	8.54	1700	1
C	1995	-5	290	9	1	0	0	5.67	31.03	850	1
C	1996	-4	290	17	1	0	0	5.67	58.62	850	1
C	1997	-3	290	7	1	0	0	5.67	24.14	850	1
C	1998	-2	290	8	1	0	0	5.67	27.59	850	1
C	1999	-1	290	7	0	0	0	5.67	24.14	850	1
C	2000	0	290	7	0	1	0	5.67	24.14	850	1
C	2001	1	290	6	0	1	0	5.67	20.69	850	1
C	2002	2	290	4	0	1	0	5.67	13.79	850	1
C	2003	3	290	13	0	1	0	5.67	44.83	850	1
C	2004	4	290	6	0	1	0	5.67	20.69	850	1
C	2005	5	244	8	0	1	0	5.50	32.79	850	1