BookmarkSubscribeRSS Feed
newtriks
Obsidian | Level 7
Spoiler
Hello, this may be a stupid question but I'm having trouble interpreting my output. 

Code: 
DATA NPS;
INPUT Year Visits EFG STU;
YearIndx = Year-2012;
LogVisits = Log(Visits/1000000);
Event=1; Incident=EFG; Rate=EFG/(Visits/1000000); OUTPUT;
Event=0; Incident=STU; Rate=STU/(Visits/1000000); OUTPUT;
DATALINES;
2013 273630895 4187 620 
2014 292800082 5498 796 
2015 307247252 6160 1283 
2016 330971689 6753 3196 
2017 330882751 6605 3114 
2018 318211833 6111 3214 
2019 327516619 2820 2976 
2020 237064332 1463 2993 
;
PROC GENMOD DATA=BAKER.NPSvisittrendsCOVID plots=all;
model STU = year / dist=negbin link=log offset=LogVisits type3;
RUN;

Maximum likelihood parameter estimates from PROC GENMOD:
Parameter Estimate Standard Error Wald 95% Confidence Limits Wald Chi-Square Pr>ChiSq
Intercept -504.867 93.7332 -688.580 -321.153 29.01 <.0001
Year 0.2445 0.0465 0.1533 0.3356 27.66 <.0001
Dispersion 0.0737 0.0367 0.0278 0.1955

The way I'm interpreting is this: Exp(-504.867 + Year*0.2445) = STU. This is clearly wrong, because when I calculate that I get nothing close to the STU number. What am I missing??  Thanks in advance.
4 REPLIES 4
FreelanceReinh
Jade | Level 19

Hello @newtriks,

 


@newtriks wrote:
DATA=BAKER.NPSvisittrendsCOVID plots=all;
model STU = year / dist=negbin link=log offset=LogVisits type3;
RUN;

Maximum likelihood parameter estimates from PROC GENMOD:
Parameter Estimate Standard Error Wald 95% Confidence Limits Wald Chi-Square Pr>ChiSq
Intercept -504.867 93.7332 -688.580 -321.153 29.01 <.0001
Year 0.2445 0.0465 0.1533 0.3356 27.66 <.0001
Dispersion 0.0737 0.0367 0.0278 0.1955

The way I'm interpreting is this: Exp(-504.867 + Year*0.2445) = STU. This is clearly wrong, because when I calculate that I get nothing close to the STU number. What am I missing??

The offset is missing. Exp(LogVisits - 504.867 + Year*0.2445) will be closer to STU.

newtriks
Obsidian | Level 7

Thanks for responding - it still doesn't appear to work, though.

Let's take 2020, for example. Logvisits = log(park_visits/1000000), or log(237.064332), which equals an offset of 5.471.

So the expression yielding the predicted value would be exp(5.471 - 504.867 + 2020*0.2445).  This yields 0.004 predicted, 2993 actual.

I'm doing something wrong but I can't place my finger on it.

Any help you might provide would be greatly appreciated. Thanks!

ballardw
Super User

I don't use Genmod so walk me through what  your NPS is doing. I think this may be important as you show us code for NPS, use a different set NPSvisittrendsCOVID. The NPS set you create variables Event and Incident but do not use them anywhere in the Genmod that I see. So are you sure that Genmod code is correct for the shown data set???  When I run the given data set with that Genmod the results are not as you show. So something seems a bit off:

Different intercept estimate and all the standard errors as a start.
Analysis Of Maximum Likelihood Parameter Estimates
Parameter DF Estimate Standard
Error
Wald 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept 1 -491.051 66.2794 -620.956 -361.146 54.89 <.0001
Year 1 0.2445 0.0329 0.1800 0.3089 55.31 <.0001
Dispersion 1 0.0737 0.0259 0.0370 0.1469    

 

It is confusing to introduce terms like your "park_visits" that do not appear in the data. If I have to guess that a variable named "visits" is supposed to be treated as "park_visits" I get very uncomfortable as I have seen just too much data with similar variable  names to like that sort of assumption.

FreelanceReinh
Jade | Level 19

@newtriks wrote:

Thanks for responding - it still doesn't appear to work, though.

Let's take 2020, for example. Logvisits = log(park_visits/1000000), or log(237.064332), which equals an offset of 5.471.


log(237.064332)=5.468331...

 

As ballardw has pointed out already, your intercept estimate -504.867 is not consistent with your data, for which your PROC GENMOD code ([edit:] i.e., applied to dataset NPS) yields -491.051 . The seemingly small relative difference between these numbers has a big impact when the exponential function is applied: The result for 2020 is 4053.48 (same order of magnitude as STU=2993) as opposed to 0.004051... The factor (close to) 1,000,000 (namely exp(504.867-491.051)) between these results suggests that your incorrect intercept is due to a missing division (or multiplication) by 1,000,000 at some point in your calculation.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 709 views
  • 1 like
  • 3 in conversation