BookmarkSubscribeRSS Feed
rburnham
Calcite | Level 5

Given a set of phone call coordinates within a city, I'm attempting to predict one type of phone call (mcall) with another type of phone call (tcall) while controlling for the daytime population (daypop) of the city. My data is on a census tract level (n=142 census tracts, to imitate neighborhoods). Eventually, once I figure out a proper model at the most simple level, I will incorporate population characteristics as covariates to try and determine any significant characteristics  on the census tract (neighborhood) level.

 

Data Structure Example:

TRACT     MCALL_COUNT     TCALL_COUNT     DAYPOP

01234        1,256                        632                         6,681

02468         875                          458                         4,200

...

 

Question 1: How do I properly utilize an offset if I need to have a rate for both types of phone calls? Would I feed in the tcall_count into the model as a rate instead and take the natural log [i.e. ln(tcall_count/daypop)]?

Question 2: Is the random statement setup properly to assume all census tracts are considered in the covariance of the model?

Question 3: When I add in more covariates (i.e. population characteristics), I have had troubles with convergence, is this due to the model structure?

Question 4: Is there a better way to model the spatial autocorrelation of the centroids of the census tracts (i.e. lat_centered lon_centered)?

 

Here's the most simple version of what I've tried:

	proc glimmix data=analysis_n;
ln_daypop = log(daypop); model mcall_count = tcall_count / dist=poisson offset=ln_daypop solution; random _residual_ / subject=intercept type=sp(exp)(lat_centered lon_centered); run;

SAS Version 9.4 M5

4 REPLIES 4
PGStats
Opal | Level 21

If you are mostly interested in prediction, and not in inference, you might accept to venture into statistically shaky territory... How would  the model log(mcall/tcall) = a*log(daypop) + b fit with OLS, for example?

PG
rburnham
Calcite | Level 5

Thanks PG Stats. I took your suggestion of OLS and dropped the poisson part of the model since I have plenty high counts. Any concerns with having the model structured in this fashion?

 

%macro poisson_model(covars);
	proc glimmix data=analysis_n plots=all;
		ln_daypop = log(est_daypop);
lnmcall = log(mcall);
lntcall = log(tcall); model lnmcall = lntcall ln_daypop &covars. / solution; random _residual_ / subject=intercept type=sp(exp)(lat_centered lon_centered); run; %mend poisson_model;

The covars macro variable would eventually include population characteristics (using forward selection) such as median household income, % employed, etc.

 

 

ln(tcall) is a significant predictor of ln(mcall), and now I'm simply trying to find significant population characteristics associated with the mcalls given I'm controlling for the tcalls.

PGStats
Opal | Level 21

Sounds like a good strategy. Do not underestimate the value of data visualization, it can save you hours of aimless wandering. Plot the residuals against your predicted values and your other predictors. Forward selection might give you a starting point but the final model should be one that makes sense to you. 

PG
rburnham
Calcite | Level 5
Thanks for your help! After residual diagnostic review, I'm happy with the model fit.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 960 views
  • 3 likes
  • 2 in conversation