turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Enterprise Miner Assessment Score Rankings Output...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-15-2011 12:06 PM

The documentation for output in EM is not thorough. Basically I'd like to know

1)how the column Mean Posterier Probability is computed. I'd also like to verify

2) that the values under Cumulative % Response are based on *OBSERVED* class values, not predicted.

Below is my output and the base SAS code I used to try to verify my conception of this output using the scored validation data from EM, but I'm not able to do so.

Could someone help me with interpretation? For more details, say I have the following Output From EM:

Mean

Cumulative % Cumulative Number of Posterior

Percentile Gain Lift Lift Response % Response Observations Probabil

5 143.210 2.43210 2.43210 100.000 100.000 10 0.81433

10 106.728 1.70247 2.06728 70.000 85.000 10 0.69886

15 78.354 1.21605 1.78354 50.000 73.333 10 0.64560

20 64.167 1.21605 1.64167 50.000 67.500 10 0.59028

25 50.790 0.97284 1.50790 40.000 62.000 10 0.53578

30 45.926 1.21605 1.45926 50.000 60.000 10 0.51555

35 44.516 1.35117 1.44516 55.556 59.420 9 0.47502

40 35.459 0.72963 1.35459 30.000 55.696 10 0.44177

45 28.437 0.72963 1.28437 30.000 52.809 10 0.41554

50 20.377 0.48642 1.20377 20.000 49.495 10 0.39732

55 18.258 0.97284 1.18258 40.000 48.624 10 0.36956

60 16.495 0.97284 1.16495 40.000 47.899 10 0.33354

65 18.777 1.45926 1.18777 60.000 48.837 10 0.31359

70 18.080 1.08093 1.18080 44.444 48.551 9 0.30182

75 13.388 0.48642 1.13388 20.000 46.622 10 0.27938

80 9.291 0.48642 1.09291 20.000 44.937 10 0.25124

85 4.233 0.24321 1.04233 10.000 42.857 10 0.20888

90 2.476 0.72963 1.02476 30.000 42.135 10 0.15998

95 2.200 0.97284 1.02200 40.000 42.021 10 0.08486

100 0.000 0.54047 1.00000 22.222 41.117 9 0.00652

By exporting the scored validation data from EM to csv and reading into SAS,sorting by descending predicted probabilty and using PROC SQL to ad a row number (call this data 'VALIDATE' for reference) I can recreate the values under 'Cumulative % Response' by

/* CUM % RESP FOR 10TH PERCENTILE */

PROC FREQ DATA = ;

TABLES OBS_TARGET;

WHERE ROW_NUM =20;

RUN;

This tells me that the value in 'Cumulative % Response' is based on the *OBSERVED* response not the predicted response. ( I can write code to get predicted response outside EM in base SAS if necessary)

I am trying to interpret the values under Mean Posterior Probability. I just assumed it was the mean of the predicted probability for the target class level in that slice of the sorted data. However, if I invoke the following in base SAS code:

PROC MEANS DATA = VALIDATE;

VAR Predicted_Term_GPA_Less_than_1_; /* CREATED IN EM */

WHERE ROW_NUM =20;

I do not get the same number reported in the column for Mean Posterior Probabilty for the 10th percentile. I do match exactly for the 5th percentile, but it must be a coincidence.

Can someone tell me how this value is computed?

1)how the column Mean Posterier Probability is computed. I'd also like to verify

2) that the values under Cumulative % Response are based on *OBSERVED* class values, not predicted.

Below is my output and the base SAS code I used to try to verify my conception of this output using the scored validation data from EM, but I'm not able to do so.

Could someone help me with interpretation? For more details, say I have the following Output From EM:

Mean

Cumulative % Cumulative Number of Posterior

Percentile Gain Lift Lift Response % Response Observations Probabil

5 143.210 2.43210 2.43210 100.000 100.000 10 0.81433

10 106.728 1.70247 2.06728 70.000 85.000 10 0.69886

15 78.354 1.21605 1.78354 50.000 73.333 10 0.64560

20 64.167 1.21605 1.64167 50.000 67.500 10 0.59028

25 50.790 0.97284 1.50790 40.000 62.000 10 0.53578

30 45.926 1.21605 1.45926 50.000 60.000 10 0.51555

35 44.516 1.35117 1.44516 55.556 59.420 9 0.47502

40 35.459 0.72963 1.35459 30.000 55.696 10 0.44177

45 28.437 0.72963 1.28437 30.000 52.809 10 0.41554

50 20.377 0.48642 1.20377 20.000 49.495 10 0.39732

55 18.258 0.97284 1.18258 40.000 48.624 10 0.36956

60 16.495 0.97284 1.16495 40.000 47.899 10 0.33354

65 18.777 1.45926 1.18777 60.000 48.837 10 0.31359

70 18.080 1.08093 1.18080 44.444 48.551 9 0.30182

75 13.388 0.48642 1.13388 20.000 46.622 10 0.27938

80 9.291 0.48642 1.09291 20.000 44.937 10 0.25124

85 4.233 0.24321 1.04233 10.000 42.857 10 0.20888

90 2.476 0.72963 1.02476 30.000 42.135 10 0.15998

95 2.200 0.97284 1.02200 40.000 42.021 10 0.08486

100 0.000 0.54047 1.00000 22.222 41.117 9 0.00652

By exporting the scored validation data from EM to csv and reading into SAS,sorting by descending predicted probabilty and using PROC SQL to ad a row number (call this data 'VALIDATE' for reference) I can recreate the values under 'Cumulative % Response' by

/* CUM % RESP FOR 10TH PERCENTILE */

PROC FREQ DATA = ;

TABLES OBS_TARGET;

WHERE ROW_NUM =20;

RUN;

This tells me that the value in 'Cumulative % Response' is based on the *OBSERVED* response not the predicted response. ( I can write code to get predicted response outside EM in base SAS if necessary)

I am trying to interpret the values under Mean Posterior Probability. I just assumed it was the mean of the predicted probability for the target class level in that slice of the sorted data. However, if I invoke the following in base SAS code:

PROC MEANS DATA = VALIDATE;

VAR Predicted_Term_GPA_Less_than_1_; /* CREATED IN EM */

WHERE ROW_NUM =20;

I do not get the same number reported in the column for Mean Posterior Probabilty for the 10th percentile. I do match exactly for the 5th percentile, but it must be a coincidence.

Can someone tell me how this value is computed?

Accepted Solutions

Solution

07-10-2017
03:17 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SlutskyFan

02-15-2011 04:00 PM

Replying to my own post, I figured out the mean posterior probability calculation. My mind was stuck on calculating 'cumulative quantities. Iterating the following code, selecting row numbers that correspond to the given percentile, will match the output Miner gives for Mean Posterior Probability.

For the 10th Percentile (in the chart I linked to earlier - http://econometricsense.blogspot.com/2011/02/sample-assessment-score-rankings-for.html )

%LET FROM = 10;

%LET TO = 20;

PROC MEANS DATA = VALIDATE;

VAR Predicted__TERM_GPA_Less_than_1_;

WHERE ROW_NUM > &FROM AND ROW_NUM LE &TO;

RUN;

I can't believe I bothered technical support over this. I just don't trust my results without solid documentation, or just checking the raw data myself.

For the 10th Percentile (in the chart I linked to earlier - http://econometricsense.blogspot.com/2011/02/sample-assessment-score-rankings-for.html )

%LET FROM = 10;

%LET TO = 20;

PROC MEANS DATA = VALIDATE;

VAR Predicted__TERM_GPA_Less_than_1_;

WHERE ROW_NUM > &FROM AND ROW_NUM LE &TO;

RUN;

I can't believe I bothered technical support over this. I just don't trust my results without solid documentation, or just checking the raw data myself.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SlutskyFan

02-15-2011 12:17 PM

Unfortunately, the chart did not publish the way it pasted in the editor- see

http://econometricsense.blogspot.com/2011/02/sample-assessment-score-rankings-for.html

for more readable output. Thanks.

http://econometricsense.blogspot.com/2011/02/sample-assessment-score-rankings-for.html

for more readable output. Thanks.

Solution

07-10-2017
03:17 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SlutskyFan

02-15-2011 04:00 PM

For the 10th Percentile (in the chart I linked to earlier - http://econometricsense.blogspot.com/2011/02/sample-assessment-score-rankings-for.html )

%LET FROM = 10;

%LET TO = 20;

PROC MEANS DATA = VALIDATE;

VAR Predicted__TERM_GPA_Less_than_1_;

WHERE ROW_NUM > &FROM AND ROW_NUM LE &TO;

RUN;

I can't believe I bothered technical support over this. I just don't trust my results without solid documentation, or just checking the raw data myself.