Hi All,
Summary of Forward Selection | |||||
Step | Effect | DF | Number | Score | Pr > ChiSq |
Entered | In | Chi-Square | |||
1 | HOLIDAYS_EVER | 1 | 1 | 25674.4253 | <.0001 |
2 | No_Open_Emails | 1 | 2 | 9588.2483 | <.0001 |
3 | ATTRIBUTE1 | 2 | 3 | 5585.7066 | <.0001 |
4 | PROMO | 1 | 4 | 2264.6877 | <.0001 |
5 | AcornCategory | 6 | 5 | 2352.6139 | <.0001 |
6 | WEEKLY_AVG_TRANSACTI | 1 | 6 | 1488.1469 | <.0001 |
7 | PREFERENCE_CODE | 1 | 7 | 1126.3222 | <.0001 |
8 | TOT_PRODUCTS | 1 | 8 | 777.0709 | <.0001 |
9 | AFFLUENCE_KEY_DESC | 4 | 9 | 425.1519 | <.0001 |
10 | LIFESTAGE_KEY_DESC | 3 | 10 | 478.32 | <.0001 |
11 | TOTAL_VALUE | 1 | 11 | 354.7241 | <.0001 |
12 | REGISTRATION_DELIVER | 1 | 12 | 261.7896 | <.0001 |
13 | DISCOUNT | 1 | 13 | 34.4105 | <.0001 |
14 | REFERER | 1 | 14 | 9.2234 | 0.0024 |
Association of Predicted Probabilities and Observed Responses | |||
Percent Concordant | 80.5 | Somers' D | 0.611 |
Percent Discordant | 19.4 | Gamma | 0.612 |
Percent Tied | 0.2 | Tau-a | 0.279 |
Pairs | 9109345450 | c | 0.806 |
I highly suggest you review the following pages and then try asking your question again. This is a very broad question, in my opinion.
Statistical Computing Seminars: Introduction to SAS proc logistic
https://www.google.com/search?q=proc+logistic+site%3Alexjansen.com
Google search using: proc logistic site:lexjansen.com
Hi,
being a classically trained statistician who was introduced to data mining only later in my career, I consider myself biased against data dredging. Stepwise selection is often brought up as a pragmatic example of using computational power to replace domain knowledge. In data mining is it not uncommon to start with hundreds or thousands of variables. It is just unpractical to analyze one variable at a time. That’s where I tend to use stepwise regression, as an initial variable selection method used in combination with other variable selection methods such as decision trees, IV,…
In your case it seems like you’re starting with a small number of variables. That’s where domain knowledge should come in to help decide what to include and what to exclude, sometimes regardless of their p-value.
G
A quick one you say? Not sure.
You have a large number of cases, which means that you could detect very subtle effects which in fact don't make any practical difference. A statistically significant effect is not necessarily an important effect. One way you can assess the importance of an effect empirically is by looking at the change in Percent Concordant when you remove a variable from the model.
Percent concordant is the proportion of matching prediction changes, i.e. if case i is not an event and case j is an event, then if predicted P(i) < P(j) then the ij pair is concordant, if P(i) > P(j) the pair is discordant and if P(i)=P(j) then the pair is tied.
hth
Pg
This seems to be an overfitting problem, I suggest check collinearity before adding variables in the analysis.
Naeem
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.