BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
noetsi
Obsidian | Level 7

I am looking for an answer to this in many places but have not found it yet. I am running a linear probability model (something I am new to since my field uses logistic regression normally). My dependent variable has 2 levels 0 and 1. All the predictors I am interested in have two levels as well (dummies coded 0 and 1). It is not clear to me whether the results from Proc Genmod are showing the increased probability of being at level 1 in the dependent variable or level 0. I ran this code (I don’t show the CLASS or MODEL statement because they are very long I have 50 plus variables in the model).

 

PROC GENMOD DATA=WORK.SORTTempTableSorted

           PLOTS(ONLY)=None

;

 

I am using the defaults for everything. I assume that Genmod with the defaults predicts level 1 (shows the increased or decreased chances of being in level 1 on the DV), but I am not certain. I also assume that it leaves the coding of the dummy predictors the same, so a 0 remains a 0 and a 1 a one.

 

noetsi_0-1626801000587.png

 

 

So in the above results if you are at level 0 there is a negative 28.39 percent chance of being in level one of the dv.

 

You can reach me at Russell.Hellein@vr.fldoe.org My site number is 70014208 I would very much appreciate your help. Historically I have used proc reg but the lack of a CLASS statement for that made this less than ideal and PROC GENMOD was recommended instead.

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

People use terms in different ways, but yes, I would call the model with DIST=BIN and LINK=IDENTITY a linear probability model since the model is E(Y)=x'beta, where Y is distributed as binomial and E(Y) is its mean for a given setting of x. Odds ratios are only available when the logit link is used since the logit is the log odds and t the difference of logits is a log odds ratio. Exponentiating the parameter of a binary predictor then is an estimate of an odds ratio. Predicted event probabilities are easily available from the P= option in the OUTPUT statement.

View solution in original post

5 REPLIES 5
StatDave
SAS Super FREQ

In both the log and in the output (just below the Response Profile table), it tells you the level whose probability it is modeling. By default, it models the probability of level 0 for a 0,1 coded response. You should always use the EVENT= response option to specify the level you want. If you want to model probability of 1, then specify  MODEL Y(EVENT="1") = ... . It's the same with PROC LOGISTIC and other procedures that model a binary response.

noetsi
Obsidian | Level 7

Thank you. When I ran it without specifying a distribution it apparently does not do either. I think it treats the DV as an interval level variable. When I specified dist=binomial then it does tell you this. But that generated new issues. First, do I need to declare the dependent variable (DV) in the class statement? And much more important I get an error, and lack of convergence although results are generated.

 

ERROR: The mean parameter is either invalid or at a limit of its range for some observations.

 

This error I suspect means that there are probabilities beyond 0 and 1 which are impossible (and why I don't like LPM models). But I don't know how I can fix this problem, if I can, and if I can interpret the slopes that result as legitimate given this error and the lack of convergence. 

StatDave
SAS Super FREQ

If you don't specify the DIST= option, then by default the response is considered to be normally distributed which would be wildly inappropriate for a binary response. Again, this is shown in the output. The response does not need to be specified in the CLASS statement and I advise that you do not do so. That error is very common if you do not use the logit link (LINK=LOGIT) when DIST=BINOMIAL is specified. As you note, that occurs when some predicted values are not valid binomial means which should be a probability with value between 0 and 1. Only the logit link assures that the predicted values will be valid binomial means. You do not state why you don't want to use the typical logit link with your binary response, but if the purpose is to assess the effect of each predictor directly on the event probability, then this is what predictive margins and marginal effects are for. Margins and marginal effects can be computed by the Margins macro. There are examples shown (and links to more are provided) in the Results tab in Margins macro documentation.

noetsi
Obsidian | Level 7

Thank you for your response. I would infinitely prefer to run a logistic regression model in this case. I was taught to do that because of the well known errors when you have a binary DV -which you note. However, the federal government has decided for our agency that LPM models have to be run so there is no option to use the logit link (that is run logistic regression).  Although I do not know the details many economists believe that LPM models are fine (meaning linear models with a binary dependent variable) and I think this process was determined by economists. I have the whole population so the SE really don't matter to me at all only the accuracy of the slopes, I ignore the statistical slope test since I have the population. Although the error I mentioned caused the SE not to be reported it does generate slopes. Some have suggested the slopes will still be true despite the error, but I have to get more details. 

 

As a really basic point, is specifying /LINK=identity DIST=BINOMIAL  the correct way to generate the Linear Probability model in Proc Genmod? This is actually the first time I have not used Proc Logistics for a binary DV so I am getting the basics right. And how in he code do you request the odds ratios and predicted probabilities -I was hoping to plot them so as to argue that LPM might be unreasonable with the feds. 🙂
;

StatDave
SAS Super FREQ

People use terms in different ways, but yes, I would call the model with DIST=BIN and LINK=IDENTITY a linear probability model since the model is E(Y)=x'beta, where Y is distributed as binomial and E(Y) is its mean for a given setting of x. Odds ratios are only available when the logit link is used since the logit is the log odds and t the difference of logits is a log odds ratio. Exponentiating the parameter of a binary predictor then is an estimate of an odds ratio. Predicted event probabilities are easily available from the P= option in the OUTPUT statement.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2287 views
  • 2 likes
  • 2 in conversation