BookmarkSubscribeRSS Feed
akimme
Obsidian | Level 7

I'm trying to do binomial regression modeling with some biomedical data on the web version of SAS, but PROC GENMOD is giving me this error message for about 36 out of the 40 variables I want to incorporate:


ERROR: The mean parameter is either invalid or at a limit of its range for some observations.


So it looks like this means that the estimate of association it's calculating for a predictor is either less than 0 or greater than 1, and because the measure of association can't be less than anybody or more than everybody, it's having errors. This is a bit mysterious to me because I did not see any eg race group where everyone in that group had the same outcome relative to my response variable.

 

Anyway, I get graphs like this:

Graph with yes and no all the way in a corner rather than along the dotted lineGraph with yes and no all the way in a corner rather than along the dotted line


I have some idea of what's probably wrong, I'm just not familiar enough with SAS coding to fix it on my own and would really appreciate some help.

 

SAS is also putting up warnings that my model is not converging, which I suspect is because most of my variables are not yet added, but if you think it may be something else, please explain.

 

Other info:

The variables that run are all ordinal (scale of 1-5, etc). Just about everything else has a character string ("always," "often"...) for each option.

 

*only variables that run;
proc genmod;
class  connectn    ratherknown normaln    explainn 
age  

/ param=glm;
model longbi = connectn    ratherknown normaln     explainn 
age  

/ dist=bin link=log;
lsmeans connectn    ratherknown normaln     explainn 
age 

/ diff exp cl;
run;

*example code with the error;
proc genmod;
class  connectn    ratherknown normaln    explainn 
age  
White   Black   Asian   Native  Pacific MidEast Mixed   raceOther

/ param=glm;
model longbi = connectn    ratherknown normaln     explainn 
age  
White   Black   Asian   Native  Pacific MidEast Mixed   raceOther

/ dist=bin link=log;
lsmeans connectn    ratherknown normaln     explainn 
age 
White   Black   Asian   Native  Pacific MidEast Mixed   raceOther

/ diff exp cl;
run;

Graph for age with thin gray lines running horizontal and vertical but still no blue or red significant/not significant marksGraph for age with thin gray lines running horizontal and vertical but still no blue or red significant/not significant marksGraph for Black race with yes and no in the bottom right cornerGraph for Black race with yes and no in the bottom right cornerLeast squares graphs for Black race predictor with a lot of missing/weird outputsLeast squares graphs for Black race predictor with a lot of missing/weird outputsLeast squares graphs for Other race predictor with a lot of missing/weird outputsLeast squares graphs for Other race predictor with a lot of missing/weird outputs

Model fit criteria (poor)Model fit criteria (poor)

7 REPLIES 7
StatDave
SAS Super FREQ

This, again, is addressed in the note that I referred you to in your last post about this log binomial model. That note refers to fitting problems that are quite likely with this model due to the log link not ensuring that predicted values are restricted to the [0,1] range and links to this note that mentions the error you are seeing. The more complex the model, the more likely these fitting problems will be. So, I again suggest that you try one of the other modeling approaches discussed in that note for estimating relative risks.

akimme
Obsidian | Level 7
I'm still pretty new to SAS and it's hard for me to follow the notes if they don't have examples. Is there some example code you could point me to?
akimme
Obsidian | Level 7

Actually, I've been asking too broad of questions. Maybe I should have posted in New to SAS but log binomial seems to be a less common method. Let's try:

 

1. I tested the non-numericized variable explain (“always,” “often,” etc,) responses against explainn (-2 to 2). but only explainn works.  When I recode variables like that, I check them with FREQ so I know that if there were for example, 104 “always” responses and 170 “often” responses, that's going to be the same for both explain and explainn. If SAS’ estimate is falling out of the 0-1 range for one of those, despite having the same numbers of each response, this tells me that SAS is doing some different procedure for strings versus numerical values.

 

Where in that code does it tell SAS to weight character strings differently than numbers and how do I fix that?

 

2. If LSMEANS is used for categorical variables and ESTIMATE for continuous/numerical variables, why are none of my categorical/string variables working? Only the ones with ordinal values, expressed in numbers -2 to 2 or 0 to 5, are running. Why don’t I need ESTIMATE for that? Is this a different version problem? I’m on the online SAS version.

 

Why is LSMEANS running only numerical data?

 

Changing from log to logit changed some errors into warnings but I need to run some other checks before I commit to it (and it might mean I need to update some other models, which I’d prefer to avoid). Either way, I need to make sure SAS is weighting number and string variables equally in the model or find a way to use exclusively one or the other. 

sbxkoenk
SAS Super FREQ

Hello @akimme ,

 

There is NO difference between a class variable (categorical covariate) that is alpha-numerically coded versus one that is numerically coded.
Only ... the reference category cannot be the same.

When formatting the design matrix (this is done by the GENMOD procedure), a reference category is determined. You can choose/determine that yourself , but if you don't , the alphabetically last (or alphabetically first) level is chosen. This can of course differ (between alpha-numerically coded versus numerically coded).


Use the

  • PARAM=keyword
  • REF=’level’

options for the CLASS variables
to be sure you are comparing what can be compared.

 

SAS/STAT User's Guide
The GENMOD Procedure
CLASS Statement
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_genmod_syntax05.htm

 

Koen

 
sbxkoenk
SAS Super FREQ

Hello,

 

By using the LINK=LOG option with binomial data, you are trying to model the relative risk.

The more common logit link (LINK=LOGIT) results in a logistic model which allows you to estimate odds ratios rather than relative risks.  The logit link makes it much less likely to encounter such errors.

If you must use the log link, see this usage note:

Usage Note 23003: Estimating a relative risk (also called risk ratio, prevalence ratio)

   http://support.sas.com/kb/23/003.html

 

The answers to many questions can be found in the Samples and SAS Notes in our searchable knowledgebase, http://support.sas.com/kb .

You can use the search engine there to find the answers you need.

 

Koen

akimme
Obsidian | Level 7
Thanks for the database link! I've read through that note but many of its examples are for different models than what I need and it's hard for me to follow the notes without examples.

Could you direct me toward some sample code or more beginner friendly guides?
sbxkoenk
SAS Super FREQ

Maybe you could have a look here:

 

HOW CAN I ESTIMATE RELATIVE RISK IN SAS USING PROC GENMOD FOR COMMON OUTCOMES IN COHORT STUDIES? | SAS FAQ
https://stats.oarc.ucla.edu/sas/faq/how-can-i-estimate-relative-risk-in-sas-using-proc-genmod-for-co...

See the sections :

  • Relative risk estimation by log-binomial regression
  • Adjusting the relative risk for continuous or categorical covariates

 

And here:

https://www.lexjansen.com/search/searchresults.php?q=relative%20risk%20log%20binomial%20regression%2...

 

SAS Tip: Learn lexjansen.com
https://communities.sas.com/t5/SAS-Tips-from-the-Community/SAS-Tip-Learn-lexjansen-com/m-p/436336#M2...

 

Good luck,
Koen

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 704 views
  • 7 likes
  • 3 in conversation