turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- BI
- /
- Enterprise Guide
- /
- Logistic Regression with Categorical Independent V...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-03-2017 02:41 PM

I'm running a logistic regression for an alumni population to indicate what factors relate to odds of giving. For gender I have a variable that I coded (1,0) so it's binary. If I want to include degrees (i.e. BA, BS, MBA, and PHD) do I create 4 binary variables so that if someone has a BA then they would have 1 in the BA column but 0 for BS, MBA, and PHD? I just want to take sure I code it correctly. Thank you!

Accepted Solutions

Solution

05-05-2017
11:45 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2017 10:53 AM

No, I think it's how you're phrasing things.

At any rate, you can control the reference levels, for both independent and dependent variables so you can have the odds ratio come out exactly as you want, as long as you define your model correctly.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-03-2017 03:25 PM

Use the CLASS statement for categorical variables.

Check the documentation, there's an example of dealing with categorical variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2017 09:22 AM

I used the CLASS statement in a simple logistic regression model with only two independent variables. These variables were Preferred_Prof_Suffix and AlsoParent . The values were Y o rN . When the model was run the output showed Preferred_Prof_Suffix N and AlsoAParent N. Does this mean the output shows negative responses towards the dependent varaible? Should I interpret the output in terms of the reciprocal or is there a way for the independent variables to show up as Y in the output?

output used:

proc logistic data=work.work;

class Preferred_Prof_Suffix AlsoParent;

model donor(event='1') = Preferred_Prof_Suffix AlsoParent;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2017 09:51 AM

mmagnuson wrote:

I used the CLASS statement in a simple logistic regression model with only two independent variables. These variables were Preferred_Prof_Suffix and AlsoParent . The values were Y o rN . When the model was run the output showed Preferred_Prof_Suffix N and AlsoAParent N. Does this mean the output shows negative responses towards the dependent varaible? Should I interpret the output in terms of the reciprocal or is there a way for the independent variables to show up as Y in the output?

output used:

proc logistic data=work.work;

class Preferred_Prof_Suffix AlsoParent;

model donor(event='1') = Preferred_Prof_Suffix AlsoParent;

run;

It's great you show us the code that you are using, but we can't see what you are seeing in the output. Please show us the output so we can understand what the question is.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2017 09:53 AM - edited 05-05-2017 10:00 AM

Forgot to attach it. Here is the output.

Thanks.

My dependent variable is Donor (1 for yes and 0 for no) if I run this code it looks for an event of 1 which would be a donor but the iv's show up as No. It seems like my output would be "These are the log odds of a person not having this iv if they are a donor which is awkward to explain to NOT look for the iv's of the population. It'd make more sense to describe it saying "these are the log odds of a person HAVING this iv and donating". I'm just not sure if there's a way with the code that I could have the Y value of the iv come up in the output instead of the N value.

Hope I explained it well.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2017 10:02 AM

The documentation is very clear about this:

"PROC LOGISTIC detects linear dependency among the last two design variables and sets the parameter for A2(B=2) to zero, resulting in an interpretation of these parameters as if they were reference- or dummy-coded. The REFERENCE or GLM parameterization might be more appropriate for such problems."

So, the parameter for the Y levels are set to zero, because of the design.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2017 10:15 AM

It seems like GLM is for more linear models and is different than a logistic model. If I merely reversed my dataset so that Y was no and N was yes would that solve the problem?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2017 10:41 AM

You can set the reference level for the dependent variable as well.

`model dep_var(event='Yes') = ....;`

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2017 10:43 AM

mmagnuson wrote:

Forgot to attach it. Here is the output.

Thanks.

My

dependent variable is Donor(1 for yes and 0 for no) if I run this code it looks for an event of 1 which would be a donor but theiv's show up as No. It seems like my output would be "These are the log odds of a person not having this iv if they are a donor which is awkward to explain to NOT look for the iv's of the population. It'd make more sense to describe it saying "these are the log odds of aperson HAVING this ivand donating". I'm just not sure if there's a way with the code that I could have the Y value of the iv come up in the output instead of the N value.Hope I explained it well.

Are you mixing the terms indepdent and dependent variables here?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2017 10:51 AM

No.

I am trying to figure out what facotors would contribute to someone making a donation or not. My dependent variable is donor and my independent variable is Prereffred Professional Suffix and AlsoParent to figure out if those who have a professional suffix or are also parents will have greater odds of being a donor.

Does that make sense?

Solution

05-05-2017
11:45 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2017 10:53 AM

No, I think it's how you're phrasing things.

At any rate, you can control the reference levels, for both independent and dependent variables so you can have the odds ratio come out exactly as you want, as long as you define your model correctly.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2017 11:45 AM

Sorry that I wasn't clear but luckily you were! Thank you!