BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bbb_NG
Fluorite | Level 6


Dear all,

Here's the code.

1.

proc logistic data=slide.sb_vm_training outmodel=slide.model;

CLASS N2  N3  N4  N5  N6  N7  N10  N11  N12  N13 /param=ref;

model dv = Prin1 Prin2 Prin3  factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;

run;

2.

proc logistic data=slide.sb_vm_training outmodel=slide.model;

CLASS N2  N3  N4  N5  N6  N7  N10  N11  N12  N13 /param=effect;

model dv = Prin1 Prin2 Prin3  factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;

run;

3.

proc logistic data=slide.sb_vm_training outmodel=slide.model;

CLASS N2  N3  N4  N5  N6  N7  N10  N11  N12  N13 /param=ref;

model dv = Prin1 Prin2 Prin3  factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;

unit Prin1=Prin1  =50000 Prin2  =50000 Prin3  =50000

run;

I tried the 3 different coding as above, but all failed to get a N variable into the model

Where variables with prefix "N" are ordinal variables like nationality,sex,most of them with the scope (0,9)

Prin1-prin3 are variables extracted from principal analysis,the scope for this variable is between (-Million,+Million)

factor1-factor8 are variables extracted from factor analysis,the scope for this variable is between (-2,+2)

both of them are summary of continous variables in some way,

dv is the dependent variable , with 1 shows the customer will leave, and 0 shows he will stay.

The question is when using stepwise, only prin1 and some factor variables remains, not even one N variable remains.

while judge from the real business, at least nationality is very useful to determine whether a customer will leave,

WHY not even one N variable remains?

what's wrong with my coding for Proc Logistic?

Thanks in Advance.

Dawn

1 ACCEPTED SOLUTION

Accepted Solutions
1zmm
Quartz | Level 8

You have specified the variables with an N prefix in the CLASS statement but not as independent variables in the MODEL statement.

PROC LOGISTIC will select only independent variables from the MODEL statement so that it will not select any of the N-prefix variables.

You also state that the N-prefix variables are ordinal variables but provide as examples only nominal variables (nationality, sex).

Generally, you should not perform principal components analysis or factor analysis on nominal variables but preferably only on interval/ratio/continuous variables.

Reference coding is preferred to effect coding in the PROC LOGISTIC CLASS statement because the former is easier to translate into measures of effect (like odds ratios) than the latter.  Variable selection in regression procedures has been discussed previously in this forum and is somewhat problematic.  Preferable would be some of the methods in PROC GLMSELECT, even though these methods are not optimized for dichotomous dependent variables like those used in logistic regression.

View solution in original post

4 REPLIES 4
1zmm
Quartz | Level 8

You have specified the variables with an N prefix in the CLASS statement but not as independent variables in the MODEL statement.

PROC LOGISTIC will select only independent variables from the MODEL statement so that it will not select any of the N-prefix variables.

You also state that the N-prefix variables are ordinal variables but provide as examples only nominal variables (nationality, sex).

Generally, you should not perform principal components analysis or factor analysis on nominal variables but preferably only on interval/ratio/continuous variables.

Reference coding is preferred to effect coding in the PROC LOGISTIC CLASS statement because the former is easier to translate into measures of effect (like odds ratios) than the latter.  Variable selection in regression procedures has been discussed previously in this forum and is somewhat problematic.  Preferable would be some of the methods in PROC GLMSELECT, even though these methods are not optimized for dichotomous dependent variables like those used in logistic regression.

bbb_NG
Fluorite | Level 6


1zmm,

Thx for your patient explanation.

Yes that variables with N-prefix are nominal variables,

those Prin and factor variables are generated from continuous variables only.

After I raised this question, I looked around the community, to find that proc logistic combined with class defining is not recommended, they suggest glmselect as you said.

I have one more question,can you take time to reply it?

When i used the following code (45 continuous variables)

proc princomp data=slide.sb_vm10 cov outstat=temp_prin1;

var  c1-c45;

run;

for eg variables A with large scope is within (-1M,1M),variables B with small scope  is within (-1,1),

it seems that the coefficient for Eigenvectors like prin1 will be Zero for those variables B.

Do u know in mind how to deal with such things?

Thx in advance.

SteveDenham
Jade | Level 19

Of course the coefficient is zero, or nearly so.  Variable A explains almost all of the total variation, so the amount of variation left for Variable B is negligible.  If you look at the eigenvalues associated with the vectors this should be apparent.

The question comes down to RELATIVE variability, so perhaps rescaling would help.  Not normalizing, as that will remove differences in variability.  Just putting things on the same scale will help.

Although I don't really know how you intend to use the results in forecasting a timeseries.

Steve Denham

bbb_NG
Fluorite | Level 6

Steve,

     Glad I see your recommend  ,"not Normalizing" but just "rescaling", I was just to normalizing.You saved me.Thanks

Dawn

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Discussion stats
  • 4 replies
  • 2217 views
  • 3 likes
  • 3 in conversation