Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Logistic Regression - Training and validation score exactly the same.

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-29-2019 10:16 AM
(484 views)

I tried to use the all subset selection (assess & fitandscore )from the Predictive Modeling Using Logistic Regression course notes.

But for some reason, my training and validation scores are basically the same (Graph below) and the profit plot is an horizontal line, any advice on why is this happening? I double check the codes and the training/validation dataset and I don't see any issues.

The only possible source of concern is that all my input variables are either binary or categorical variables(converted to dummy variables with reference level), would that be a concern?

Code:

%macro assess(data=,inputcount=,inputsinmodel=,index=);

proc sort data=scored&data;

by descending p_1;

run;

data assess;

attrib DATAROLE length=$5;

retain sse 0 csum 0 DATAROLE "&data";

array n[0:1,0:1] _temporary_ (0 0 0 0);

array w[0:1] _temporary_

(%sysevalf(&pi0/&rho0) %sysevalf(&pi1/&rho1));

keep DATAROLE INPUT_COUNT INDEX

TOTAL_PROFIT OVERALL_AVG_PROFIT ASE C;

set scored&data end=last;

d1=&PF11*p_1+&PF01*p_0;

d0=&PF10*p_1+&PF00*p_0;

t=(strip(ischurn)="1");

d=(d1>d0);

n[t,d] + w[t];

sse + (ischurn-p_1)**2;

csum + ((n[1,1]+n[1,0])*(1-t)*w[0]);

if last then do;

INPUT_COUNT=&inputcount;

TOTAL_PROFIT =

sum(&PF11*n[1,1],&PF10*n[1,0],&PF01*n[0,1],&PF00*n[0,0]);

OVERALL_AVG_PROFIT =

TOTAL_PROFIT/sum(n[0,0],n[1,0],n[0,1],n[1,1]);

ASE = sse/sum(n[0,0],n[1,0],n[0,1],n[1,1]);

C = csum/(sum(n[0,0],n[0,1])*sum(n[1,0],n[1,1]));

index=&index;

output;

end;

run;

proc append base=results data=assess force;

run;

%mend assess;

%macro fitandscore();

proc datasets

library=work

nodetails

nolist;

delete results;

run;

%do model_indx=1 %to &lastindx;

%let im=&&inputs&model_indx;

%let ic=&&ic&model_indx;

proc logistic data=churnmod.imputed2 des NAMELEN=50;

model ischurn=&im;

score data=churnmod.imputed2

out=scoredtrain(keep=ischurn p_1 p_0)

priorevent=&pi1;

score data=churnmod.valid2

out=scoredvalid(keep=ischurn p_1 p_0)

priorevent=&pi1;

run;

%assess(data=train,

inputcount=&ic,

inputsinmodel=&im,

index=&model_indx);

%assess(data=VALID,

inputcount=&ic,

inputsinmodel=&im,

index=&model_indx);

%end;

%mend fitandscore;

2 REPLIES 2

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To help us better assist, please provide sample data that would yield these results.

Thanks,

-unison

-unison

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Clearly, you have created TRAIN and VALID incorrectly.

The first debugging tool for you to try is to actually look at, with your own eyes, the two different data sets named TRAIN and VALID and see if they actually are different. We can't do that for you, because we don't have those data sets.

Also, you need to examine how these data sets were created to make sure you haven't somehow done something that would cause this result.

--

Paige Miller

Paige Miller

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.