BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SlutskyFan
Obsidian | Level 7
The documentation for output in EM is not thorough. Basically I'd like to know

1)how the column Mean Posterier Probability is computed. I'd also like to verify

2) that the values under Cumulative % Response are based on *OBSERVED* class values, not predicted.

Below is my output and the base SAS code I used to try to verify my conception of this output using the scored validation data from EM, but I'm not able to do so.

Could someone help me with interpretation? For more details, say I have the following Output From EM:

Mean
Cumulative % Cumulative Number of Posterior
Percentile Gain Lift Lift Response % Response Observations Probabil

5 143.210 2.43210 2.43210 100.000 100.000 10 0.81433
10 106.728 1.70247 2.06728 70.000 85.000 10 0.69886
15 78.354 1.21605 1.78354 50.000 73.333 10 0.64560
20 64.167 1.21605 1.64167 50.000 67.500 10 0.59028
25 50.790 0.97284 1.50790 40.000 62.000 10 0.53578
30 45.926 1.21605 1.45926 50.000 60.000 10 0.51555
35 44.516 1.35117 1.44516 55.556 59.420 9 0.47502
40 35.459 0.72963 1.35459 30.000 55.696 10 0.44177
45 28.437 0.72963 1.28437 30.000 52.809 10 0.41554
50 20.377 0.48642 1.20377 20.000 49.495 10 0.39732
55 18.258 0.97284 1.18258 40.000 48.624 10 0.36956
60 16.495 0.97284 1.16495 40.000 47.899 10 0.33354
65 18.777 1.45926 1.18777 60.000 48.837 10 0.31359
70 18.080 1.08093 1.18080 44.444 48.551 9 0.30182
75 13.388 0.48642 1.13388 20.000 46.622 10 0.27938
80 9.291 0.48642 1.09291 20.000 44.937 10 0.25124
85 4.233 0.24321 1.04233 10.000 42.857 10 0.20888
90 2.476 0.72963 1.02476 30.000 42.135 10 0.15998
95 2.200 0.97284 1.02200 40.000 42.021 10 0.08486
100 0.000 0.54047 1.00000 22.222 41.117 9 0.00652


By exporting the scored validation data from EM to csv and reading into SAS,sorting by descending predicted probabilty and using PROC SQL to ad a row number (call this data 'VALIDATE' for reference) I can recreate the values under 'Cumulative % Response' by

/* CUM % RESP FOR 10TH PERCENTILE */

PROC FREQ DATA = ;
TABLES OBS_TARGET;
WHERE ROW_NUM =20;
RUN;

This tells me that the value in 'Cumulative % Response' is based on the *OBSERVED* response not the predicted response. ( I can write code to get predicted response outside EM in base SAS if necessary)

I am trying to interpret the values under Mean Posterior Probability. I just assumed it was the mean of the predicted probability for the target class level in that slice of the sorted data. However, if I invoke the following in base SAS code:

PROC MEANS DATA = VALIDATE;
VAR Predicted_Term_GPA_Less_than_1_; /* CREATED IN EM */
WHERE ROW_NUM =20;

I do not get the same number reported in the column for Mean Posterior Probabilty for the 10th percentile. I do match exactly for the 5th percentile, but it must be a coincidence.

Can someone tell me how this value is computed?
1 ACCEPTED SOLUTION

Accepted Solutions
SlutskyFan
Obsidian | Level 7
Replying to my own post, I figured out the mean posterior probability calculation. My mind was stuck on calculating 'cumulative quantities. Iterating the following code, selecting row numbers that correspond to the given percentile, will match the output Miner gives for Mean Posterior Probability.

For the 10th Percentile (in the chart I linked to earlier - http://econometricsense.blogspot.com/2011/02/sample-assessment-score-rankings-for.html )

%LET FROM = 10;
%LET TO = 20;

PROC MEANS DATA = VALIDATE;
VAR Predicted__TERM_GPA_Less_than_1_;
WHERE ROW_NUM > &FROM AND ROW_NUM LE &TO;
RUN;

I can't believe I bothered technical support over this. I just don't trust my results without solid documentation, or just checking the raw data myself.

View solution in original post

2 REPLIES 2
SlutskyFan
Obsidian | Level 7
Unfortunately, the chart did not publish the way it pasted in the editor- see

http://econometricsense.blogspot.com/2011/02/sample-assessment-score-rankings-for.html

for more readable output. Thanks.
SlutskyFan
Obsidian | Level 7
Replying to my own post, I figured out the mean posterior probability calculation. My mind was stuck on calculating 'cumulative quantities. Iterating the following code, selecting row numbers that correspond to the given percentile, will match the output Miner gives for Mean Posterior Probability.

For the 10th Percentile (in the chart I linked to earlier - http://econometricsense.blogspot.com/2011/02/sample-assessment-score-rankings-for.html )

%LET FROM = 10;
%LET TO = 20;

PROC MEANS DATA = VALIDATE;
VAR Predicted__TERM_GPA_Less_than_1_;
WHERE ROW_NUM > &FROM AND ROW_NUM LE &TO;
RUN;

I can't believe I bothered technical support over this. I just don't trust my results without solid documentation, or just checking the raw data myself.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2806 views
  • 0 likes
  • 1 in conversation