SAS Procedures

rburnham · Posted 11-21-2018 05:13 PM

Given a set of phone call coordinates within a city, I'm attempting to predict one type of phone call (mcall) with another type of phone call (tcall) while controlling for the daytime population (daypop) of the city. My data is on a census tract level (n=142 census tracts, to imitate neighborhoods). Eventually, once I figure out a proper model at the most simple level, I will incorporate population characteristics as covariates to try and determine any significant characteristics on the census tract (neighborhood) level.

Data Structure Example:

TRACT MCALL_COUNT TCALL_COUNT DAYPOP

01234 1,256 632 6,681

02468 875 458 4,200

...

Question 1: How do I properly utilize an offset if I need to have a rate for both types of phone calls? Would I feed in the tcall_count into the model as a rate instead and take the natural log [i.e. ln(tcall_count/daypop)]?

Question 2: Is the random statement setup properly to assume all census tracts are considered in the covariance of the model?

Question 3: When I add in more covariates (i.e. population characteristics), I have had troubles with convergence, is this due to the model structure?

Question 4: Is there a better way to model the spatial autocorrelation of the centroids of the census tracts (i.e. lat_centered lon_centered)?

Here's the most simple version of what I've tried:

	proc glimmix data=analysis_n;                ln_daypop = log(daypop);
		model mcall_count = tcall_count / dist=poisson offset=ln_daypop solution;
		random _residual_ / subject=intercept type=sp(exp)(lat_centered lon_centered);
	run;

SAS Version 9.4 M5

PGStats · Posted 11-22-2018 02:58 PM

If you are mostly interested in prediction, and not in inference, you might accept to venture into statistically shaky territory... How would the model log(mcall/tcall) = a*log(daypop) + b fit with OLS, for example?

PG

rburnham · Posted 11-27-2018 05:28 PM

Thanks PG Stats. I took your suggestion of OLS and dropped the poisson part of the model since I have plenty high counts. Any concerns with having the model structured in this fashion?

%macro poisson_model(covars);
	proc glimmix data=analysis_n plots=all;
		ln_daypop = log(est_daypop);		lnmcall = log(mcall);		lntcall = log(tcall);
		model lnmcall = lntcall ln_daypop &covars. / solution;
		random _residual_ / subject=intercept type=sp(exp)(lat_centered lon_centered);
	run;
%mend poisson_model;

The covars macro variable would eventually include population characteristics (using forward selection) such as median household income, % employed, etc.

ln(tcall) is a significant predictor of ln(mcall), and now I'm simply trying to find significant population characteristics associated with the mcalls given I'm controlling for the tcalls.

PGStats · Posted 11-27-2018 11:53 PM

Sounds like a good strategy. Do not underestimate the value of data visualization, it can save you hours of aimless wandering. Plot the residuals against your predicted values and your other predictors. Forward selection might give you a starting point but the final model should be one that makes sense to you.

PG

rburnham · Posted 11-28-2018 11:49 AM

Thanks for your help! After residual diagnostic review, I'm happy with the model fit.

SAS Procedures

Using Counts to Predict Counts in a Geospatial Setting

Re: Using Counts to Predict Counts in a Geospatial Setting

Re: Using Counts to Predict Counts in a Geospatial Setting

Re: Using Counts to Predict Counts in a Geospatial Setting

Re: Using Counts to Predict Counts in a Geospatial Setting

How to use a "count" variable value...

Week Gap count?

Comparing Prime Computations Using the New CAS Gateway Action Set

Counting the World's Oldest Companies with SAS

Category counts

Follow Us

What is...

SAS Procedures

Join us for our biggest event of the year!

SAS Training: Just a Click Away

Follow Us

What is...