## Proc Logisitic result not include ordinal variables

Dear all,

Here's the code.

1.

proc logistic data=slide.sb_vm_training outmodel=slide.model;

CLASS N2  N3  N4  N5  N6  N7  N10  N11  N12  N13 /param=ref;

model dv = Prin1 Prin2 Prin3  factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;

run;

2.

proc logistic data=slide.sb_vm_training outmodel=slide.model;

CLASS N2  N3  N4  N5  N6  N7  N10  N11  N12  N13 /param=effect;

model dv = Prin1 Prin2 Prin3  factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;

run;

3.

proc logistic data=slide.sb_vm_training outmodel=slide.model;

CLASS N2  N3  N4  N5  N6  N7  N10  N11  N12  N13 /param=ref;

model dv = Prin1 Prin2 Prin3  factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;

unit Prin1=Prin1  =50000 Prin2  =50000 Prin3  =50000

run;

I tried the 3 different coding as above, but all failed to get a N variable into the model

Where variables with prefix "N" are ordinal variables like nationality,sex,most of them with the scope (0,9)

Prin1-prin3 are variables extracted from principal analysis,the scope for this variable is between (-Million,+Million)

factor1-factor8 are variables extracted from factor analysis,the scope for this variable is between (-2,+2)

both of them are summary of continous variables in some way,

dv is the dependent variable , with 1 shows the customer will leave, and 0 shows he will stay.

The question is when using stepwise, only prin1 and some factor variables remains, not even one N variable remains.

while judge from the real business, at least nationality is very useful to determine whether a customer will leave,

WHY not even one N variable remains?

what's wrong with my coding for Proc Logistic?

Dawn

1 ACCEPTED SOLUTION

Accepted Solutions

## Re: Proc Logisitic result not include ordinal variables

You have specified the variables with an N prefix in the CLASS statement but not as independent variables in the MODEL statement.

PROC LOGISTIC will select only independent variables from the MODEL statement so that it will not select any of the N-prefix variables.

You also state that the N-prefix variables are ordinal variables but provide as examples only nominal variables (nationality, sex).

Generally, you should not perform principal components analysis or factor analysis on nominal variables but preferably only on interval/ratio/continuous variables.

Reference coding is preferred to effect coding in the PROC LOGISTIC CLASS statement because the former is easier to translate into measures of effect (like odds ratios) than the latter.  Variable selection in regression procedures has been discussed previously in this forum and is somewhat problematic.  Preferable would be some of the methods in PROC GLMSELECT, even though these methods are not optimized for dichotomous dependent variables like those used in logistic regression.

4 REPLIES 4

## Re: Proc Logisitic result not include ordinal variables

You have specified the variables with an N prefix in the CLASS statement but not as independent variables in the MODEL statement.

PROC LOGISTIC will select only independent variables from the MODEL statement so that it will not select any of the N-prefix variables.

You also state that the N-prefix variables are ordinal variables but provide as examples only nominal variables (nationality, sex).

Generally, you should not perform principal components analysis or factor analysis on nominal variables but preferably only on interval/ratio/continuous variables.

Reference coding is preferred to effect coding in the PROC LOGISTIC CLASS statement because the former is easier to translate into measures of effect (like odds ratios) than the latter.  Variable selection in regression procedures has been discussed previously in this forum and is somewhat problematic.  Preferable would be some of the methods in PROC GLMSELECT, even though these methods are not optimized for dichotomous dependent variables like those used in logistic regression.

## Re: Proc Logisitic result not include ordinal variables

1zmm,

Yes that variables with N-prefix are nominal variables,

those Prin and factor variables are generated from continuous variables only.

After I raised this question, I looked around the community, to find that proc logistic combined with class defining is not recommended, they suggest glmselect as you said.

I have one more question,can you take time to reply it?

When i used the following code (45 continuous variables)

proc princomp data=slide.sb_vm10 cov outstat=temp_prin1;

var  c1-c45;

run;

for eg variables A with large scope is within (-1M,1M),variables B with small scope  is within (-1,1),

it seems that the coefficient for Eigenvectors like prin1 will be Zero for those variables B.

Do u know in mind how to deal with such things?  SteveDenham

## Re: Proc Logisitic result not include ordinal variables

Of course the coefficient is zero, or nearly so.  Variable A explains almost all of the total variation, so the amount of variation left for Variable B is negligible.  If you look at the eigenvalues associated with the vectors this should be apparent.

The question comes down to RELATIVE variability, so perhaps rescaling would help.  Not normalizing, as that will remove differences in variability.  Just putting things on the same scale will help.

Although I don't really know how you intend to use the results in forecasting a timeseries.

Steve Denham

## Re: Proc Logisitic result not include ordinal variables

Steve,

Glad I see your recommend  ,"not Normalizing" but just "rescaling", I was just to normalizing.You saved me.Thanks

Dawn

Discussion stats
• 4 replies
• 1041 views
• 3 likes
• 3 in conversation