BookmarkSubscribeRSS Feed
Question
Fluorite | Level 6

Hi All,

Just a quick one regarding proc logistic in SAS, I have entered 15 variables in my model and 14 variables seem to be good, does it mean that I should keep all of them in my model?  Also I have troubles at understanding the concordent and disconcordent. Does it mean that the model has predicted 80.5% correctly??If I add variables or remove which statistic should tell me that the model is still good. Is it C (AUC value)? In the past, I was using another software to build models, so I am bit lost..
Many Thanks
Summary of Forward Selection
StepEffectDFNumberScorePr > ChiSq
EnteredInChi-Square
1HOLIDAYS_EVER1125674.4253<.0001
2No_Open_Emails129588.2483<.0001
3ATTRIBUTE1235585.7066<.0001
4PROMO142264.6877<.0001
5AcornCategory652352.6139<.0001
6WEEKLY_AVG_TRANSACTI161488.1469<.0001
7PREFERENCE_CODE171126.3222<.0001
8TOT_PRODUCTS18777.0709<.0001
9AFFLUENCE_KEY_DESC49425.1519<.0001
10LIFESTAGE_KEY_DESC310478.32<.0001
11TOTAL_VALUE111354.7241<.0001
12REGISTRATION_DELIVER112261.7896<.0001
13DISCOUNT11334.4105<.0001
14REFERER1149.22340.0024
Association of Predicted Probabilities and Observed Responses
Percent Concordant80.5Somers' D0.611
Percent Discordant19.4Gamma0.612
Percent Tied0.2Tau-a0.279
Pairs9109345450c0.806
4 REPLIES 4
Reeza
Super User

I highly suggest you review the following pages and then try asking your question again. This is a very broad question, in my opinion.

Statistical Computing Seminars: Introduction to SAS proc logistic

https://www.google.com/search?q=proc+logistic+site%3Alexjansen.com

Google search using: proc logistic site:lexjansen.com

adjgiulio
Obsidian | Level 7

Hi,

being a classically trained statistician who was introduced to data mining only later in my career, I consider myself biased against data dredging. Stepwise selection is often brought up as a pragmatic example of using computational power to replace domain knowledge. In data mining is it not uncommon to start with hundreds or thousands of variables. It is just unpractical to analyze one variable at a time. That’s where I tend to use stepwise regression, as an  initial variable selection method used in combination with other variable selection methods such as decision trees, IV,…
In your case it seems like you’re starting with a small number of variables. That’s where domain knowledge should come in to help decide what to include and what to exclude, sometimes regardless of their p-value.

G

PGStats
Opal | Level 21

A quick one you say? Not sure.

You have a large number of cases, which means that you could detect very subtle effects which in fact don't make any practical difference. A statistically significant effect is not necessarily an important effect. One way you can assess the importance of an effect empirically is by looking at the change in Percent Concordant when you remove a variable from the model.

Percent concordant is the proportion of matching prediction changes, i.e. if case i is not an event and case j is an event, then if predicted P(i) < P(j) then the ij pair is concordant, if P(i) > P(j) the pair is discordant and if P(i)=P(j) then the pair is tied.

hth

Pg

PG
stat_sas
Ammonite | Level 13

This seems to be an overfitting problem, I suggest check collinearity before adding variables in the analysis.

Naeem

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1034 views
  • 1 like
  • 5 in conversation