here is what our prof told us to run for the macrolift : %MACRO liftchart1(data,yreal,phat,ngroups); /* SAS MACRO to obtain a lift chart*/ /* INPUT */ /* data = data file that contains at least the two variables below: */ /* yreal = real value of dependent variable Y (0 or 1) */ /* phat = estimation of P(Y=1) for the model */ /* ngroups = number of groups used for phat (usually 10) */ /* OUTPUT */ /* The file "lift" at the end will contain the following variables: cumfreqpercent = % of observations of the validation sample classified as 1 by the model. cumfreq = # of observations of the validation sample classified as 1 by the model. cumn1y_expected_chance = # of 1 detected by chossing observations at random. cumn1ypercent = % of 1 detected by the model cumn1y = # of 1 detected by the model cumn1y_more_expected_chance = # of additional 1 detected by the model relative to random = cumn1y-cumn1y_expected_chance lift = cumn1y/cumn1y_expected_chance A liftchart (with cumn1ypercent et cumfreqpercent) will be produced */ data outsample; set &data ; phat= &phat ; yreal= &yreal ; run; data outsample; set outsample; if phat=. then delete; if yreal=. then delete; keep phat yreal; run; proc means data=outsample n noprint; var phat; output out=ntest n=ntest; run; data ntest; set ntest; keep ntest; run; proc means data=outsample sum noprint; var yreal; output out=n1test sum=n1test; run; data n1test; set n1test; keep n1test; run; proc rank data=outsample groups=&ngroups out=deciles descending ; var phat; run; proc sort data=deciles out=deciles; by phat; run; proc means data=deciles sum noprint; var yreal; by phat; output out=lift sum=n1y; run; data lift; set lift; keep _freq_ phat n1y; run; proc sort data=lift out=lift; by phat; run; data lift; set lift; retain lagcumfreq lagcumn1y; if _N_=1 then do; cumfreq=_freq_;lagcumfreq=_freq_; cumn1y=n1y;lagcumn1y=n1y; end; if _N_>1 then do; cumfreq=lagcumfreq+_freq_;lagcumfreq=lagcumfreq+_freq_; cumn1y=lagcumn1y+n1y;lagcumn1y=lagcumn1y+n1y; end; run; data lift; merge lift ntest n1test; run; data lift; set lift ; retain lagntest lagn1test; if _N_=1 then do; lagntest=ntest;lagn1test=n1test;end; if _N_>1 then do; ntest=lagntest;n1test=lagn1test;lagntest=lagntest;lagn1test=lagn1test;end; rename phat=percent; run; data lift; set lift ; cumfreqpercent=cumfreq/ntest*100; cumn1ypercent=cumn1y/n1test*100; run; data lift; set lift ; cumn1y_expected_chance=cumfreqpercent*n1test/100; run; data lift; set lift ; cumn1y_more_expected_chance=cumn1y-cumn1y_expected_chance; lift=cumn1y/cumn1y_expected_chance; run; data lift; set lift ; keep cumfreqpercent cumfreq cumn1y_expected_chance cumn1ypercent cumn1y cumn1y_more_expected_chance lift ; run; data lift; set lift; label cumfreqpercent="% of obs classified 1 by model" cumfreq="Number of obs classified 1 by model" cumn1y_expected_chance="NUmber of 1 detected by choosing obs at random" cumn1ypercent="% of 1 detected by model" cumn1y="Number of 1 detected by model" cumn1y_more_expected_chance="Number of additional 1 detected by model relative to random"; run; proc print data=lift label; var cumfreqpercent cumfreq cumn1y_expected_chance cumn1ypercent cumn1y cumn1y_more_expected_chance lift; run; ods graphics on; proc sgplot data=lift; series y=cumn1ypercent x=cumfreqpercent / curvelabel="" ; series y=cumfreqpercent x=cumfreqpercent / curvelabel="" ; xaxis label=' % of obs classified 1 by model' VALUES= (10 to 100 by 10) grid; yaxis label='% of 1 detected by model ' VALUES= (10 to 100 by 10) grid; run; ods graphics off; %MEND liftchart1; and here is a big part of my code : /* Step 3: Logistic regression */ proc logistic data=train2 plots(only)=(roc); /* Specify categorical variables in the CLASS statement */ class housing loan poutcome month / param=ref; /* Model with both categorical and continuous variables */ model y(ref='0') = age x21 x22 x23 x24 x25 x26 x27 x28 x29 x41 x42 x43 x51 x52 x53 x54 housing loan balance day duration pdays previous poutcome / ctable; score data=train2 out=pred; run; /*^^^the code works and it gets the classification tables - pick the cutoff of 0.48 cuz it has the highest correct %*/ /*B) cross-validation with a cutoff of 0.48 for ROC & AUC = gives the 2nd line to compare*/ proc logistic data=train2; class housing loan poutcome month; model y(ref='0') = age x21 x22 x23 x24 x25 x26 x27 x28 x29 x41 x42 x43 x51 x52 x53 x54 housing loan balance day duration pdays previous poutcome/ ctable; output out=pred predprobs=crossvalidate; run; proc logistic data=pred ; class housing loan poutcome month; model y(ref='0') = age x21 x22 x23 x24 x25 x26 x27 x28 x29 x41 x42 x43 x51 x52 x53 x54 housing loan balance day duration pdays previous poutcome/ ctable; roc pred=xp_1; run; /*lift, run the macrolift from profs code first*/ proc logistic data=train2; class housing loan poutcome month; model y(ref='0') = age x21 x22 x23 x24 x25 x26 x27 x28 x29 x41 x42 x43 x51 x52 x53 x54 housing loan balance day duration pdays previous poutcome/ ctable; output out=pred predprobs=crossvalidate; run; %liftchart1(pred,y,xp_1,10); I hope it can clarify what I was trying to say.
... View more