Desktop productivity for business analysts and programmers

Logistic Regression with Categorical Independent Variables

Accepted Solution Solved
Reply
Contributor
Posts: 36
Accepted Solution

Logistic Regression with Categorical Independent Variables

I'm running a logistic regression for an alumni population to indicate what factors relate to odds of giving. For gender I have a variable that I coded (1,0) so it's binary. If I want to include degrees (i.e. BA, BS, MBA, and PHD) do I create 4 binary variables so that if someone has a BA then they would have 1 in the BA column but 0 for BS, MBA, and PHD? I just want to take sure I code it correctly. Thank you!


Accepted Solutions
Solution
‎05-05-2017 11:45 AM
Super User
Posts: 19,194

Re: Logistic Regression with Categorical Independent Variables

No, I think it's how you're phrasing things.

 

At any rate, you can control the reference levels, for both independent and dependent variables so you can have the odds ratio come out exactly as you want, as long as you define your model correctly.

 

 

View solution in original post


All Replies
Super User
Posts: 19,194

Re: Logistic Regression with Categorical Independent Variables

Use the CLASS statement for categorical variables. 

Check the documentation, there's an example of dealing with categorical variables. 

Contributor
Posts: 36

Re: Logistic Regression with Categorical Independent Variables

I used the CLASS statement in a simple logistic regression model with only two independent variables.  These variables were Preferred_Prof_Suffix and AlsoParent .  The values were Y o rN .  When the model was run the output showed Preferred_Prof_Suffix N and AlsoAParent N.  Does this mean the output shows negative responses towards the dependent varaible? Should I interpret the output in terms of the reciprocal or is there a way for the independent variables to show up as Y in the output?

 

output used:

proc logistic data=work.work;
class Preferred_Prof_Suffix AlsoParent;
model donor(event='1') = Preferred_Prof_Suffix AlsoParent;
run;

Trusted Advisor
Posts: 1,800

Re: Logistic Regression with Categorical Independent Variables


mmagnuson wrote:

I used the CLASS statement in a simple logistic regression model with only two independent variables.  These variables were Preferred_Prof_Suffix and AlsoParent .  The values were Y o rN .  When the model was run the output showed Preferred_Prof_Suffix N and AlsoAParent N.  Does this mean the output shows negative responses towards the dependent varaible? Should I interpret the output in terms of the reciprocal or is there a way for the independent variables to show up as Y in the output?

 

output used:

proc logistic data=work.work;
class Preferred_Prof_Suffix AlsoParent;
model donor(event='1') = Preferred_Prof_Suffix AlsoParent;
run;


It's great you show us the code that you are using, but we can't see what you are seeing in the output. Please show us the output so we can understand what the question is.

Contributor
Posts: 36

Re: Logistic Regression with Categorical Independent Variables

[ Edited ]

Forgot to attach it. Here is the output.

Thanks.

 

My dependent variable is Donor (1 for yes and 0 for no) if I run this code it looks for an event of 1 which would be a donor but the iv's show up as No. It seems like my output would be "These are the log odds of a person not having this iv if they are a donor which is awkward to explain to NOT look for the iv's of the population. It'd make more sense to describe it saying "these are the log odds of a person HAVING this iv and donating".  I'm just not sure if there's a way with the code that I could have the Y value of the iv come up in the output instead of the N value.

Hope I explained it well.


output.jpg
Trusted Advisor
Posts: 1,800

Re: Logistic Regression with Categorical Independent Variables

The documentation is very clear about this:

 

"PROC LOGISTIC detects linear dependency among the last two design variables and sets the parameter for A2(B=2) to zero, resulting in an interpretation of these parameters as if they were reference- or dummy-coded. The REFERENCE or GLM parameterization might be more appropriate for such problems."

 

So, the parameter for the Y levels are set to zero, because of the design.

Contributor
Posts: 36

Re: Logistic Regression with Categorical Independent Variables

It seems like GLM is for more linear models and is different than a logistic model.  If I merely reversed my dataset so that Y was no and N was yes would that solve the problem?

Super User
Posts: 19,194

Re: Logistic Regression with Categorical Independent Variables

You can set the reference level for the dependent variable as well.

 

model dep_var(event='Yes') = ....;
Super User
Posts: 19,194

Re: Logistic Regression with Categorical Independent Variables


mmagnuson wrote:

Forgot to attach it. Here is the output.

Thanks.

 

My dependent variable is Donor (1 for yes and 0 for no) if I run this code it looks for an event of 1 which would be a donor but the iv's show up as No. It seems like my output would be "These are the log odds of a person not having this iv if they are a donor which is awkward to explain to NOT look for the iv's of the population. It'd make more sense to describe it saying "these are the log odds of a person HAVING this iv and donating".  I'm just not sure if there's a way with the code that I could have the Y value of the iv come up in the output instead of the N value.

Hope I explained it well.


Are you mixing the terms indepdent and dependent variables here?

Contributor
Posts: 36

Re: Logistic Regression with Categorical Independent Variables

No.

I am trying to figure out what facotors would contribute to someone making a donation or not.  My dependent variable is donor and my independent variable is Prereffred Professional Suffix and AlsoParent to figure out if those who have a professional suffix or are also parents will have greater odds of being a donor.

Does that make sense?

Solution
‎05-05-2017 11:45 AM
Super User
Posts: 19,194

Re: Logistic Regression with Categorical Independent Variables

No, I think it's how you're phrasing things.

 

At any rate, you can control the reference levels, for both independent and dependent variables so you can have the odds ratio come out exactly as you want, as long as you define your model correctly.

 

 

Contributor
Posts: 36

Re: Logistic Regression with Categorical Independent Variables

Sorry that I wasn't clear but luckily you were! Thank you! 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 360 views
  • 0 likes
  • 3 in conversation