BookmarkSubscribeRSS Feed
Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

 

I tried to use the all subset selection (assess & fitandscore )from the Predictive Modeling Using Logistic Regression course notes.

 

But for some reason, my training and validation scores are basically the same (Graph below) and the profit plot is an horizontal line, any advice on why is this happening?  I double check the codes and the training/validation dataset and I don't see any issues.

 

The only possible source of concern is that all my input variables are either binary or categorical variables(converted to dummy variables with reference level), would that be a concern?

 

Code:

 


%macro assess(data=,inputcount=,inputsinmodel=,index=);

proc sort data=scored&data;
by descending p_1;
run;

data assess;
attrib DATAROLE length=$5;
retain sse 0 csum 0 DATAROLE "&data";
array n[0:1,0:1] _temporary_ (0 0 0 0);
array w[0:1] _temporary_
(%sysevalf(&pi0/&rho0) %sysevalf(&pi1/&rho1));
keep DATAROLE INPUT_COUNT INDEX
TOTAL_PROFIT OVERALL_AVG_PROFIT ASE C;

set scored&data end=last;


d1=&PF11*p_1+&PF01*p_0;
d0=&PF10*p_1+&PF00*p_0;

t=(strip(ischurn)="1");
d=(d1>d0);

n[t,d] + w[t];
sse + (ischurn-p_1)**2;
csum + ((n[1,1]+n[1,0])*(1-t)*w[0]);


if last then do;
INPUT_COUNT=&inputcount;
TOTAL_PROFIT =
sum(&PF11*n[1,1],&PF10*n[1,0],&PF01*n[0,1],&PF00*n[0,0]);
OVERALL_AVG_PROFIT =
TOTAL_PROFIT/sum(n[0,0],n[1,0],n[0,1],n[1,1]);
ASE = sse/sum(n[0,0],n[1,0],n[0,1],n[1,1]);
C = csum/(sum(n[0,0],n[0,1])*sum(n[1,0],n[1,1]));
index=&index;
output;
end;
run;

proc append base=results data=assess force;
run;

%mend assess;


%macro fitandscore();
proc datasets
library=work
nodetails
nolist;
delete results;
run;

 

%do model_indx=1 %to &lastindx;
%let im=&&inputs&model_indx;
%let ic=&&ic&model_indx;

proc logistic data=churnmod.imputed2 des NAMELEN=50;
model ischurn=&im;
score data=churnmod.imputed2
out=scoredtrain(keep=ischurn p_1 p_0)
priorevent=&pi1;
score data=churnmod.valid2
out=scoredvalid(keep=ischurn p_1 p_0)
priorevent=&pi1;
run;


%assess(data=train,
inputcount=&ic,
inputsinmodel=&im,
index=&model_indx);
%assess(data=VALID,
inputcount=&ic,
inputsinmodel=&im,
index=&model_indx);

%end;
%mend fitandscore;

 

 

 

SAS.PNGsas2.PNG

2 REPLIES 2
unison
Lapis Lazuli | Level 10

To help us better assist, please provide sample data that would yield these results.

 

Thanks,

-unison

-unison
PaigeMiller
Diamond | Level 26

Clearly, you have created TRAIN and VALID incorrectly.

 

The first debugging tool for you to try is to actually look at, with your own eyes, the two different data sets named TRAIN and VALID and see if they actually are different. We can't do that for you, because we don't have those data sets.


Also, you need to examine how these data sets were created to make sure you haven't somehow done something that would cause this result.

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 519 views
  • 0 likes
  • 3 in conversation