Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Forecasting
- /
- Proc Logisitic result not include ordinal variables

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 12-30-2012 10:42 PM
(1040 views)

Dear all,

Here's the code.

1.

proc logistic data=slide.sb_vm_training outmodel=slide.model;

CLASS N2 N3 N4 N5 N6 N7 N10 N11 N12 N13 /param=ref;

model dv = Prin1 Prin2 Prin3 factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;

run;

2.

proc logistic data=slide.sb_vm_training outmodel=slide.model;

CLASS N2 N3 N4 N5 N6 N7 N10 N11 N12 N13 /param=effect;

model dv = Prin1 Prin2 Prin3 factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;

run;

3.

proc logistic data=slide.sb_vm_training outmodel=slide.model;

CLASS N2 N3 N4 N5 N6 N7 N10 N11 N12 N13 /param=ref;

model dv = Prin1 Prin2 Prin3 factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;

unit Prin1=Prin1 =50000 Prin2 =50000 Prin3 =50000

run;

I tried the 3 different coding as above, but all failed to get a N variable into the model

Where variables with prefix "N" are ordinal variables like nationality,sex,most of them with the scope (0,9)

Prin1-prin3 are variables extracted from principal analysis,the scope for this variable is between (-Million,+Million)

factor1-factor8 are variables extracted from factor analysis,the scope for this variable is between (-2,+2)

both of them are summary of continous variables in some way,

dv is the dependent variable , with 1 shows the customer will leave, and 0 shows he will stay.

The question is when using stepwise, only prin1 and some factor variables remains, not even one N variable remains.

while judge from the real business, at least nationality is very useful to determine whether a customer will leave,

WHY not even one N variable remains?

what's wrong with my coding for Proc Logistic?

Thanks in Advance.

Dawn

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You have specified the variables with an N prefix in the CLASS statement but not as independent variables in the MODEL statement.

PROC LOGISTIC will select only independent variables from the MODEL statement so that it will not select any of the N-prefix variables.

You also state that the N-prefix variables are ordinal variables but provide as examples only nominal variables (nationality, sex).

Generally, you should not perform principal components analysis or factor analysis on nominal variables but preferably only on interval/ratio/continuous variables.

Reference coding is preferred to effect coding in the PROC LOGISTIC CLASS statement because the former is easier to translate into measures of effect (like odds ratios) than the latter. Variable selection in regression procedures has been discussed previously in this forum and is somewhat problematic. Preferable would be some of the methods in PROC GLMSELECT, even though these methods are not optimized for dichotomous dependent variables like those used in logistic regression.

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You have specified the variables with an N prefix in the CLASS statement but not as independent variables in the MODEL statement.

PROC LOGISTIC will select only independent variables from the MODEL statement so that it will not select any of the N-prefix variables.

You also state that the N-prefix variables are ordinal variables but provide as examples only nominal variables (nationality, sex).

Generally, you should not perform principal components analysis or factor analysis on nominal variables but preferably only on interval/ratio/continuous variables.

Reference coding is preferred to effect coding in the PROC LOGISTIC CLASS statement because the former is easier to translate into measures of effect (like odds ratios) than the latter. Variable selection in regression procedures has been discussed previously in this forum and is somewhat problematic. Preferable would be some of the methods in PROC GLMSELECT, even though these methods are not optimized for dichotomous dependent variables like those used in logistic regression.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

1zmm,

Thx for your patient explanation.

Yes that variables with N-prefix are nominal variables,

those Prin and factor variables are generated from continuous variables only.

After I raised this question, I looked around the community, to find that proc logistic combined with class defining is not recommended, they suggest glmselect as you said.

I have one more question,can you take time to reply it?

When i used the following code (45 continuous variables)

proc princomp data=slide.sb_vm10 cov outstat=temp_prin1;

var c1-c45;

run;

for eg variables A with large scope is within (-1M,1M),variables B with small scope is within (-1,1),

it seems that the coefficient for Eigenvectors like prin1 will be Zero for those variables B.

Do u know in mind how to deal with such things?

Thx in advance.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Of course the coefficient is zero, or nearly so. Variable A explains almost all of the total variation, so the amount of variation left for Variable B is negligible. If you look at the eigenvalues associated with the vectors this should be apparent.

The question comes down to RELATIVE variability, so perhaps rescaling would help. Not normalizing, as that will remove differences in variability. Just putting things on the same scale will help.

Although I don't really know how you intend to use the results in forecasting a timeseries.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Steve,

Glad I see your recommend ,"not Normalizing" but just "rescaling", I was just to normalizing.You saved me.Thanks

Dawn

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

**If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. **

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.