Help using Base SAS procedures

Logistic regression or GLM

Reply
Occasional Contributor
Posts: 6

Logistic regression or GLM

I am new to logistic and GLM procedures, and therefore I have some syntactical and conceptual questions:


I have a dataset(attached to this post) which has information about the salary and various other important characteristics of all faculty (n=52) in a college.  The descriptions of the variables are as follows:

OBS: observation #

SX: sex (0=Male, 1=Female)

RK: rank (1=Assistant Professor, 2=Associate Professor, 3=Full Professor)

YR: # years in current rank

DG: highest degree (0=Masters, 1=Doctorate)

YD: # years since highest degree earned

SL: academic year salary ($)

I need to determine if gender is associated with rank, highest degree, number of years in current rank, number of years since highest degree earned, and academic year salary.

Since my gender is a binary outcome, I have used logistic regression to address the question. However I am getting a result where all my predictors seem highly significant which does not look to be correct. Am I approaching this question correctly or is my syntax not correct? Should I be using GLM?


My code is as follows:

proc logistic data=discrimination;

freq yd;

freq yr;

class rk dg;

model sx(descending) =rk yr dg yd sl;

run;

Another question that I am addressing is:

2. Is there a significant relationship between rank and academic year salary?

I am using a simple regression model. Here I have assigned rank as X (categorical) and salary as Y(continuous). Am I doing this correctly?

Below is the code:

proc reg data=discrimination SIMPLE;

model SL = rk;

run;

Thanks in advance for your suggestions!

Attachment
Super User
Posts: 10,028

Re: Logistic regression or GLM

Actually I am also rookie of statistical theory. But I don't understand why you want use yd ,yr to be FREQ ? That couldn't be . And your code of logistic mode is also not look good, Did you check it more in the documentation ?

proc reg only be used to sequential data not categorical data ,therefore i think it is not a good idea .or you should try to use proc glm .

Ksharp

Respected Advisor
Posts: 2,655

Re: Logistic regression or GLM

While this method may work (in the sense that you get a solution), I think you might have reversed the roles of independent and dependent variable, based on your statement "I need to determine if gender is associated with rank, etc.".  I would think that you might want to just know if the average rank, number of years, etc. differ for males and females.  Thus, for the ordinal responses (rank and highest degree), PROC FREQ would probably be the straightforward analysis.  For the interval responses (YR, YD, YL) as the dependent variable, I would start with PROC GLM, but pay particular attention to the distribution of the residuals.  If the residuals deviate a lot from normality (and I would use QQ plots to determine this rather than normality tests), I would move to a procedure that could capture the distribution, such as PROC GENMOD or PROC GLIMMIX.

Steve Denham

Trusted Advisor
Posts: 2,115

Re: Logistic regression or GLM

Posted in reply to SteveDenham

Biobee,

In addition to Steve's comments, I would also caution that your sample size is extremely small (N=52) so you are unlikely to be able to do more than univariable analyses.

[The reason that everything was significant in your initial PROC LOGISTIC is the FREQ statements.  The FREQ statement treats those variables as observation multipliers, so you effective sample size became many thousands instead of 52.]

Doc Muhlbaier

Duke

Ask a Question
Discussion stats
  • 3 replies
  • 231 views
  • 3 likes
  • 4 in conversation