BookmarkSubscribeRSS Feed
DID
Fluorite | Level 6 DID
Fluorite | Level 6

I'm conducting a difference-in-difference analysis to assess the impact of public housing demolition on violent crime. I'm using a multilevel, negative binomial model coded with proc glimmix. My intervention group is census tracts where public housing was completely demolished, and my control group is census tracts where public housing underwent routine maintenance. There are two key pieces of data that could affect my analyses: 1) the public housing developments that were demolished are as large as 1,700 units, and as small as 400 units.; 2) since I am using census data from the 2000 and 2010 census, my census tract population data abruptly shift, sometimes dropping by as many as 1,400 people between censuses.

 

Question 1: Since the drop in the number of housing units would correlate with the population decline in a given census tract over time, I believe what I need to do is somehow weight the data analysis to account for there being larger public housing developments in some tracts and smaller developments in others prior to demolition. How do I go about this? Is it as simple as adding WEIGHT=number_units_at_start to the RANDOM statement? Most of the articles I've found are on survey weighting, and aren't helpful. Others are indecipherable, since I'm proficient but not fluent in biostatistics.

 

Question 2: I've also entertained the idea of adjusting for the number of public housing units at the start of my timeframe. The control tracts lost housing units as well, but to routine maintenance, not wholesale demolition. What would the impact of this be vs. adding some kind of weighting variable? 

 

Here are examples of my code and data (there are 176 rows of data from 8 intervention and 8 control tracts in the full table; housing unit data are just examples since that variable isn't in my dataset at the moment). Thank you in advance for any advice.

 

proc glimmix data=main.toph_es_demo PLOTS=pearsonpanel(marginal);
class tract couplet exposed (ref="0") timeline (ref="-1") c00;
model totcrime = timeline*pre timeline*post exposed timeline*pre*exposed timeline*post*exposed c00/ 
 solution

                 dist   = negbin
                 link   = log
                 offset = logpopyrs
                 cl;
random int/subject=couplet(tract) type=un cl s;
covtest 'var(couplet(tract))=0' 0 .;
output out=residmodel1_negbin_TOPH_des pred(noblup)=predicted
                pearson(noblup)=pearson;
run;

 

TractYearTimelineTract_popCrime_countprepostexposedlog_population_yrsviolent_crime_ratenumber_units_at_startcouplet
I1995-51205811017.0967.2217001
I1996-41205591017.0948.9617001
I1997-31205261017.0921.5817001
I1998-21205301017.0924.9017001
I1999-11205240017.0919.9217001
I200001205260117.0921.5817001
I200111205180117.0914.9417001
I200221205100117.098.3017001
I200331205130117.0910.7917001
I200441205110117.099.1317001
I200551991170117.608.5417001
C1995-529091005.6731.038501
C1996-4290171005.6758.628501
C1997-329071005.6724.148501
C1998-229081005.6727.598501
C1999-129070005.6724.148501
C2000029070105.6724.148501
C2001129060105.6720.698501
C2002229040105.6713.798501
C20033290130105.6744.838501
C2004429060105.6720.698501
C2005524480105.5032.798501

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 0 replies
  • 306 views
  • 0 likes
  • 1 in conversation