Hello @newtriks,
@newtriks wrote:
DATA=BAKER.NPSvisittrendsCOVID plots=all;
model STU = year / dist=negbin link=log offset=LogVisits type3;
RUN;
Maximum likelihood parameter estimates from PROC GENMOD:
Parameter Estimate Standard Error Wald 95% Confidence Limits Wald Chi-Square Pr>ChiSq
Intercept -504.867 93.7332 -688.580 -321.153 29.01 <.0001
Year 0.2445 0.0465 0.1533 0.3356 27.66 <.0001
Dispersion 0.0737 0.0367 0.0278 0.1955
The way I'm interpreting is this: Exp(-504.867 + Year*0.2445) = STU. This is clearly wrong, because when I calculate that I get nothing close to the STU number. What am I missing??
The offset is missing. Exp(LogVisits - 504.867 + Year*0.2445) will be closer to STU.
Thanks for responding - it still doesn't appear to work, though.
Let's take 2020, for example. Logvisits = log(park_visits/1000000), or log(237.064332), which equals an offset of 5.471.
So the expression yielding the predicted value would be exp(5.471 - 504.867 + 2020*0.2445). This yields 0.004 predicted, 2993 actual.
I'm doing something wrong but I can't place my finger on it.
Any help you might provide would be greatly appreciated. Thanks!
I don't use Genmod so walk me through what your NPS is doing. I think this may be important as you show us code for NPS, use a different set NPSvisittrendsCOVID. The NPS set you create variables Event and Incident but do not use them anywhere in the Genmod that I see. So are you sure that Genmod code is correct for the shown data set??? When I run the given data set with that Genmod the results are not as you show. So something seems a bit off:
Analysis Of Maximum Likelihood Parameter Estimates | |||||||
---|---|---|---|---|---|---|---|
Parameter | DF | Estimate | Standard Error |
Wald 95% Confidence Limits | Wald Chi-Square | Pr > ChiSq | |
Intercept | 1 | -491.051 | 66.2794 | -620.956 | -361.146 | 54.89 | <.0001 |
Year | 1 | 0.2445 | 0.0329 | 0.1800 | 0.3089 | 55.31 | <.0001 |
Dispersion | 1 | 0.0737 | 0.0259 | 0.0370 | 0.1469 |
It is confusing to introduce terms like your "park_visits" that do not appear in the data. If I have to guess that a variable named "visits" is supposed to be treated as "park_visits" I get very uncomfortable as I have seen just too much data with similar variable names to like that sort of assumption.
@newtriks wrote:
Thanks for responding - it still doesn't appear to work, though.
Let's take 2020, for example. Logvisits = log(park_visits/1000000), or log(237.064332), which equals an offset of 5.471.
log(237.064332)=5.468331...
As ballardw has pointed out already, your intercept estimate -504.867 is not consistent with your data, for which your PROC GENMOD code ([edit:] i.e., applied to dataset NPS) yields -491.051 . The seemingly small relative difference between these numbers has a big impact when the exponential function is applied: The result for 2020 is 4053.48 (same order of magnitude as STU=2993) as opposed to 0.004051... The factor (close to) 1,000,000 (namely exp(504.867-491.051)) between these results suggests that your incorrect intercept is due to a missing division (or multiplication) by 1,000,000 at some point in your calculation.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.