Given a set of phone call coordinates within a city, I'm attempting to predict one type of phone call (mcall) with another type of phone call (tcall) while controlling for the daytime population (daypop) of the city. My data is on a census tract level (n=142 census tracts, to imitate neighborhoods). Eventually, once I figure out a proper model at the most simple level, I will incorporate population characteristics as covariates to try and determine any significant characteristics on the census tract (neighborhood) level.
Data Structure Example:
TRACT MCALL_COUNT TCALL_COUNT DAYPOP
01234 1,256 632 6,681
02468 875 458 4,200
...
Question 1: How do I properly utilize an offset if I need to have a rate for both types of phone calls? Would I feed in the tcall_count into the model as a rate instead and take the natural log [i.e. ln(tcall_count/daypop)]?
Question 2: Is the random statement setup properly to assume all census tracts are considered in the covariance of the model?
Question 3: When I add in more covariates (i.e. population characteristics), I have had troubles with convergence, is this due to the model structure?
Question 4: Is there a better way to model the spatial autocorrelation of the centroids of the census tracts (i.e. lat_centered lon_centered)?
Here's the most simple version of what I've tried:
proc glimmix data=analysis_n;
ln_daypop = log(daypop);
model mcall_count = tcall_count / dist=poisson offset=ln_daypop solution;
random _residual_ / subject=intercept type=sp(exp)(lat_centered lon_centered);
run;
SAS Version 9.4 M5
If you are mostly interested in prediction, and not in inference, you might accept to venture into statistically shaky territory... How would the model log(mcall/tcall) = a*log(daypop) + b fit with OLS, for example?
Thanks PG Stats. I took your suggestion of OLS and dropped the poisson part of the model since I have plenty high counts. Any concerns with having the model structured in this fashion?
%macro poisson_model(covars);
proc glimmix data=analysis_n plots=all;
ln_daypop = log(est_daypop);
lnmcall = log(mcall);
lntcall = log(tcall);
model lnmcall = lntcall ln_daypop &covars. / solution;
random _residual_ / subject=intercept type=sp(exp)(lat_centered lon_centered);
run;
%mend poisson_model;
The covars macro variable would eventually include population characteristics (using forward selection) such as median household income, % employed, etc.
ln(tcall) is a significant predictor of ln(mcall), and now I'm simply trying to find significant population characteristics associated with the mcalls given I'm controlling for the tcalls.
Sounds like a good strategy. Do not underestimate the value of data visualization, it can save you hours of aimless wandering. Plot the residuals against your predicted values and your other predictors. Forward selection might give you a starting point but the final model should be one that makes sense to you.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.