## Logistic regression or GLM

Occasional Contributor
Posts: 6

# Logistic regression or GLM

I am new to logistic and GLM procedures, and therefore I have some syntactical and conceptual questions:

I have a dataset(attached to this post) which has information about the salary and various other important characteristics of all faculty (n=52) in a college.  The descriptions of the variables are as follows:

OBS: observation #

SX: sex (0=Male, 1=Female)

RK: rank (1=Assistant Professor, 2=Associate Professor, 3=Full Professor)

YR: # years in current rank

DG: highest degree (0=Masters, 1=Doctorate)

YD: # years since highest degree earned

I need to determine if gender is associated with rank, highest degree, number of years in current rank, number of years since highest degree earned, and academic year salary.

Since my gender is a binary outcome, I have used logistic regression to address the question. However I am getting a result where all my predictors seem highly significant which does not look to be correct. Am I approaching this question correctly or is my syntax not correct? Should I be using GLM?

My code is as follows:

proc logistic data=discrimination;

freq yd;

freq yr;

class rk dg;

model sx(descending) =rk yr dg yd sl;

run;

Another question that I am addressing is:

2. Is there a significant relationship between rank and academic year salary?

I am using a simple regression model. Here I have assigned rank as X (categorical) and salary as Y(continuous). Am I doing this correctly?

Below is the code:

proc reg data=discrimination SIMPLE;

model SL = rk;

run;

Super Contributor
Posts: 644

## Re: Logistic regression or GLM

A few initial remarks:

Your use of the freq statement is incorrect.  You would only use it if you had n identical instances which were represented in one obsevation of your data, with a frequency of n.

Modelling sex as if it were a dependent attribute is a bit perverse.  I would expect you to model rank or salary on some set of the other indicators.  To use logistic on salary you would have to segment the data, perhaps as low, mid or high.

I'll leave the finer points to others.

Richard in Oz

Occasional Contributor
Posts: 6

## Re: Logistic regression or GLM

Hi Richard

Thanks for replying to my post and your suggestion for not using freq. With regards to using gender as the outcome variable, I agree with your point of view, however I cannot change what the question requires. So will have to work with gender as outcome.

Super User
Posts: 20,765

## Re: Logistic regression or GLM

You state your question as :

I need to determine if gender is associated with rank, highest degree, number of years in current rank, number of years since highest degree earned, and academic year salary.

What did your univariate comparison say for each variable before multivariate model?

Second of all, I'm with RichardinOz, your outcome shouldn't be gender, that is an dependent variable the outcome is something else.

Association doesn't have to mean the variable is the independent variable.

Super Contributor
Posts: 644

## Re: Logistic regression or GLM

You say "I cannot change what the question requires. So will have to work with gender as outcome."

I disagree.  Having gender as the outcome implies you have a population which undergoes sex change as it progresses through rank, academic outcome and salary.

As an analyst you have a responsibility to challenge incorrect assumptions.  Otherwise you are little better than a 'script kiddie' throwing code at data in the hope that something sticks.  Who is asking the question?  Go back to them and get them to restate the problem.

Richard in Oz

Discussion stats
• 4 replies
• 339 views
• 0 likes
• 3 in conversation