## Problem with opposite effects in scorecard

Occasional Contributor
Posts: 7

# Problem with opposite effects in scorecard

Hi

I am using SAS Enterprise Miner 13.2 with the Credit Scoring to build a prediction model for the usage of credit cards.

I suspect a problem with collinearity in my input data, as I always end up with at least one positive effect while the rest is negative. Depending on which criteria and variables I choose to include, this might be a different variable for each setting, and the same variable might be a positive effect in some settings and a negative one in other settings.

What is a good strategy to avoid this problem?

It is very difficult to explain the variables on its own, when you have a variable with opposite effect.

Do I risk losing valuable information by excluding the variable?

Is it a good way to identify which of the included variables in the scorecard are related, when explaining this effect?

Or just keep the opposite effects and give the answer "because the statistician said so" when asked?

I know that the data might be related, and I am not too worried about new data being from a different population, as we are looking at our own customer database, and will continue to do so.

Analysis of Maximum Likelihood Estimates

Standard          Wald                  Standardized
Parameter                     DF    Estimate       Error    Chi-Square    Pr > ChiSq        Estimate    Exp(Est)

Intercept                      1     -2.9574      0.0697       1798.25        <.0001                       0.052
WOE_1                       1     -0.7656      0.0718        113.81        <.0001         -1.1490       0.465
WOE_2                       1     -0.3554      0.1008         12.43        0.0004         -0.3569       0.701
WOE_3                       1     -0.4776      0.0592         65.10        <.0001         -0.2544       0.620
WOE_4                       1     -0.2444      0.1340          3.33        0.0682         -0.0642       0.783
WOE_5                       1      0.2427      0.1030          5.55        0.0185          0.0562       1.275

The last effect here is positive, while the rest are negative.

Fit statistics, just for fun

 Fit Statistics Statistics Label Train Validation Test _AIC_ Akaike's Information Criterion 3508.10 . . _ASE_ Average Squared Error 0.05 0.05 0.05 _AVERR_ Average Error Function 0.17 0.17 0.17 _DFE_ Degrees of Freedom for Error 10557.00 . . _DFM_ Model Degrees of Freedom 6.00 . . _DFT_ Total Degrees of Freedom 10563.00 . . _DIV_ Divisor for ASE 21126.00 15846.00 15850.00 _ERR_ Error Function 3496.10 2655.00 2670.64 _FPE_ Final Prediction Error 0.05 . . _MAX_ Maximum Absolute Error 1.00 1.00 0.99 _MSE_ Mean Square Error 0.05 0.05 0.05 _NOBS_ Sum of Frequencies 10563.00 7923.00 7925.00 _NW_ Number of Estimate Weights 6.00 . . _RASE_ Root Average Sum of Squares 0.21 0.21 0.21 _RFPE_ Root Final Prediction Error 0.21 . . _RMSE_ Root Mean Squared Error 0.21 0.21 0.21 _SBC_ Schwarz's Bayesian Criterion 3551.69 . . _SSE_ Sum of Squared Errors 963.57 724.58 731.04 _SUMW_ Sum of Case Weights Times Freq 21126.00 15846.00 15850.00 _MISC_ Misclassification Rate 0.05 0.05 0.05 _AUR_ Area Under ROC 0.83 0.82 0.81 _Gini_ Gini Coefficient 0.65 0.64 0.62 _KS_ Kolmogorov-Smirnov Statistic 0.51 0.52 0.51 _ARATIO_ Accuracy Ratio 0.65 0.64 0.62
SAS Super FREQ
Posts: 306

## Re: Problem with opposite effects in scorecard

Posted in reply to KristineNavesta

You're absolutely right - it is likely due to collinearity among your inputs.  Are you using a model selection method in the Scorecard node?  That might help eliminate the problem.

Occasional Contributor
Posts: 7

## Re: Problem with opposite effects in scorecard

Posted in reply to WendyCzika

Yes, I am using stepwise model selection. Multicollinearity is a problem in most model selection methods as well, as the variables on its own give good meaning, and together they get a to high absolute value of the coefficient, but with opposite signs.

I have tried adding a variable clustering node and using the cluster variables, but my model statistics drop and I get a poorer model.

Is there a way in Miner to figure out which of the variables are most correlated? Is using the clustering variable the best option?

SAS Super FREQ
Posts: 306

## Re: Problem with opposite effects in scorecard

Posted in reply to KristineNavesta

You could try doing variable selection with the HP Variable Selection node (on the HPDM tab).  With unsupervised selection (an option for the Target Model property), it analyzes variance and reduces dimensionality by forward selection of the variables that contribute the most to the overall data variance.  Or you can do sequential selection which first performs unsupervised selection, then does supervised selection where the target is taken into account.

Occasional Contributor
Posts: 7

## Re: Problem with opposite effects in scorecard

Posted in reply to WendyCzika

Very cool, I get really different variables as the selected variabels than the IG and scorecard node would choose. Then using the interactive grouping and scorecard node, I get a model with less variables, and still one positive effect, three negative effects.

So, still opposite effects, weaker variable coefficients, and the model comparison node will rather choose my previous model.

I am guessing that I have to accept that the data has too much collinearity and that it I really should try to find new data or more independent variables?

Discussion stats
• 4 replies
• 326 views
• 1 like
• 2 in conversation