turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Forecasting
- /
- Proc Logisitic result not include ordinal variable...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-30-2012 10:42 PM

Dear all,

Here's the code.

1.

proc logistic data=slide.sb_vm_training outmodel=slide.model;

CLASS N2 N3 N4 N5 N6 N7 N10 N11 N12 N13 /param=ref;

model dv = Prin1 Prin2 Prin3 factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;

run;

2.

proc logistic data=slide.sb_vm_training outmodel=slide.model;

CLASS N2 N3 N4 N5 N6 N7 N10 N11 N12 N13 /param=effect;

model dv = Prin1 Prin2 Prin3 factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;

run;

3.

proc logistic data=slide.sb_vm_training outmodel=slide.model;

CLASS N2 N3 N4 N5 N6 N7 N10 N11 N12 N13 /param=ref;

model dv = Prin1 Prin2 Prin3 factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;

unit Prin1=Prin1 =50000 Prin2 =50000 Prin3 =50000

run;

I tried the 3 different coding as above, but all failed to get a N variable into the model

Where variables with prefix "N" are ordinal variables like nationality,sex,most of them with the scope (0,9)

Prin1-prin3 are variables extracted from principal analysis,the scope for this variable is between (-Million,+Million)

factor1-factor8 are variables extracted from factor analysis,the scope for this variable is between (-2,+2)

both of them are summary of continous variables in some way,

dv is the dependent variable , with 1 shows the customer will leave, and 0 shows he will stay.

The question is when using stepwise, only prin1 and some factor variables remains, not even one N variable remains.

while judge from the real business, at least nationality is very useful to determine whether a customer will leave,

WHY not even one N variable remains?

what's wrong with my coding for Proc Logistic?

Thanks in Advance.

Dawn

Accepted Solutions

Solution

01-02-2013
08:08 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-02-2013 08:08 AM

You have specified the variables with an N prefix in the CLASS statement but not as independent variables in the MODEL statement.

PROC LOGISTIC will select only independent variables from the MODEL statement so that it will not select any of the N-prefix variables.

You also state that the N-prefix variables are ordinal variables but provide as examples only nominal variables (nationality, sex).

Generally, you should not perform principal components analysis or factor analysis on nominal variables but preferably only on interval/ratio/continuous variables.

Reference coding is preferred to effect coding in the PROC LOGISTIC CLASS statement because the former is easier to translate into measures of effect (like odds ratios) than the latter. Variable selection in regression procedures has been discussed previously in this forum and is somewhat problematic. Preferable would be some of the methods in PROC GLMSELECT, even though these methods are not optimized for dichotomous dependent variables like those used in logistic regression.

All Replies

Solution

01-02-2013
08:08 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-02-2013 08:08 AM

You have specified the variables with an N prefix in the CLASS statement but not as independent variables in the MODEL statement.

PROC LOGISTIC will select only independent variables from the MODEL statement so that it will not select any of the N-prefix variables.

You also state that the N-prefix variables are ordinal variables but provide as examples only nominal variables (nationality, sex).

Generally, you should not perform principal components analysis or factor analysis on nominal variables but preferably only on interval/ratio/continuous variables.

Reference coding is preferred to effect coding in the PROC LOGISTIC CLASS statement because the former is easier to translate into measures of effect (like odds ratios) than the latter. Variable selection in regression procedures has been discussed previously in this forum and is somewhat problematic. Preferable would be some of the methods in PROC GLMSELECT, even though these methods are not optimized for dichotomous dependent variables like those used in logistic regression.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-04-2013 02:54 AM

1zmm,

Thx for your patient explanation.

Yes that variables with N-prefix are nominal variables,

those Prin and factor variables are generated from continuous variables only.

After I raised this question, I looked around the community, to find that proc logistic combined with class defining is not recommended, they suggest glmselect as you said.

I have one more question,can you take time to reply it?

When i used the following code (45 continuous variables)

proc princomp data=slide.sb_vm10 cov outstat=temp_prin1;

var c1-c45;

run;

for eg variables A with large scope is within (-1M,1M),variables B with small scope is within (-1,1),

it seems that the coefficient for Eigenvectors like prin1 will be Zero for those variables B.

Do u know in mind how to deal with such things?

Thx in advance.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-04-2013 07:23 AM

Of course the coefficient is zero, or nearly so. Variable A explains almost all of the total variation, so the amount of variation left for Variable B is negligible. If you look at the eigenvalues associated with the vectors this should be apparent.

The question comes down to RELATIVE variability, so perhaps rescaling would help. Not normalizing, as that will remove differences in variability. Just putting things on the same scale will help.

Although I don't really know how you intend to use the results in forecasting a timeseries.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-04-2013 08:10 PM

Steve,

Glad I see your recommend ,"not Normalizing" but just "rescaling", I was just to normalizing.You saved me.Thanks

Dawn