turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Adjusted Prior in Enterprise Miner 7.1..Please hel...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-24-2014 08:00 AM

Hi ,

I have built a model in SAS entreprise Miner, but in the sample I used , the responders are over represented (1=0.0351 and 0=0.9649), so I have used the Decisions in EM and add the Adjusted Priors to correct it (1=0.0042 and 0=0.9958), please see table below. I have run the model first without adding the adjusted to see the probabilities I get, then run the model again with Adjusted Prior. But when I score new data I get exactly the same probabilities in both case . ....it doesn't seem that it has corrected the prob.

Could you please help? Am I missing a step?

Many Thanks

Alice

Level | Count | Prior | Adjusted Prior |

1 | 8865 | 0.0351 | 0.0042 |

0 | 243610 | 0.9649 | 0.9958 |

Accepted Solutions

Solution

07-10-2017
04:39 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-30-2014 10:13 AM

Sorry, I probably wasn't clear. I just meant that when you use the Decisions node after a modeling node, you will see explicitly in the score code something like the following:

*** Update Posterior Probabilities;

P_BAD1 = P_BAD1 * 0.5 / 0.1994966442953;

P_BAD0 = P_BAD0 * 0.5 / 0.80050335570469;

drop _sum; _sum = P_BAD1 + P_BAD0 ;

if _sum > 4.135903E-25 then do;

P_BAD1 = P_BAD1 / _sum;

P_BAD0 = P_BAD0 / _sum;

end;

Whereas if you specified your adjusted priors before modeling in the Input Data Source node, you won't see those calculations in the score code because the values for P_BAD1 and P_BAD0 are already taking those priors into account - the proc is doing the adjustment. But when applying this score code in the Score node, the adjustment is still being made. So running:

HMEQ(adjusted priors each 0.5) > Tree (not using priors/decisions for the split search or for pruning assessment)

should give you the same posteriors and misclassification as

HMEQ > Tree > Decisions(adjusted priors each 0.5).

Hope that helps clarify.

Wendy Czika

SAS Enterprise Miner R&D

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-31-2014 03:43 PM

Hi Alice,

It seems to me you are trying to use the Decisions Node to train models with an oversampled training set (to address your over represented respondents) and then get the predicted probabilities adjusted using the decision weights.

Your example looks very similar to the one described on Applied Analytics Using Enterprise Miner course notes, specifically on the Enrollment Management Case Study (Appendix A).

Try this on your Decisions node:

1. Go to the Decision Weights table.

2. For level 1 specify Decision 1 as 28.49 (1/0.0351=28.49)

3. For level 0 specify Decision 2 as 10.36 (1/.9649=10.36)

4. Leave all the other decisions for the remaining levels as 0.

From the brief explanation on the course notes, what this does is to calculate a frequency variable for each observation based on the level you specified. This frequency will weight your resulting model appropriately.

Please feel free to follow up if I am missing anything on your modeling task.

Thanks!

Miguel

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-07-2014 03:50 PM

Many Thanks Miguel , will try and will let you know....

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-28-2014 09:18 AM

Hi Alice, Miguel,

I bumped into the same problem. If I understand the original question correctly, Alice, you were initially using the data source Decisions, not the "Decisions node" like Miguel answer implies?

I'm asking because at least I'm using the data source Decisions to set the priors (i have a oversampled rare case data set directly as a data source, and I have computed the adjusted prior probabilities from the original data set outside the Miner), because SAS EMiner Documentation (for 12.1) specifically states that after this priors are then automatically used to compute the posterior probabilities in the models, and that should be all that is needed.

But I have the same situation with scoring: apparently priors are not used in the scoring code. I wonder is this a bug in SAS code or in the SAS documentation?

Also I do not completely understand Miguel's answer: if the priors are the problem here, and the SAS does not utilize the adjusted priors in Scoring, why not change the priors in the Decision node after modeling instead of Decision weights?

Any help is appreciated...

Cheers,

Janne

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-29-2014 02:34 PM

What modeling node(s) are you using? In many of the nodes, the Decision Tree node for example, the adjustment for prior probabilities is performed directly in the Enterprise Miner procedure that is run (PROC ARBOR e.g.), so you won't see the adjustment being made in the score code. But you should see different posterior probabilities when running with adjusted priors vs. running without, for the nodes that support decisions. I just ran simple flows with the sample data HMEQ > Decision Tree, one with adjusted priors and one without, and the posterior probabilities were different.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-30-2014 03:44 AM

Hi Wendy,

Thanks for the answer! I'm indeed using Decision Tree also in this case, and the task is to predict the outcome probabilities of a categorical target variable (with several classes) for each of the data points .

Do I understand correctly that in your test you get the also the scoring code posterior probabilities correctly adjusted, not only the decision tree posteriors inside the DEcision tree node? In that case I must be doing something wrong here… Because I just earlier tested, in an attempt to get this working, the effect of Decisions node with adjusted priors after the modeling node (Decision Tree) and that way I managed to get the posterior probabilities correctly in the scoring also (P_* variables). To my understanding I get the posterior probabilities also correctly inside the decision tree node i.e. the modeling itself uses the adjusted priors, it is only the Score node after that seems to ignore those, unless I use Decisions node to explicitly set the adjusted priors again just before scoring.

Could it be that my usage of Score node (I'm only using it to score the same data source's test and validation set to get the optimized scoring code for copying it to Enterprise Guide, and I'm not using a separate score data inside Miner) somehow affects the outcome here?

Many thanks for your help!

Janne

Solution

07-10-2017
04:39 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-30-2014 10:13 AM

Sorry, I probably wasn't clear. I just meant that when you use the Decisions node after a modeling node, you will see explicitly in the score code something like the following:

*** Update Posterior Probabilities;

P_BAD1 = P_BAD1 * 0.5 / 0.1994966442953;

P_BAD0 = P_BAD0 * 0.5 / 0.80050335570469;

drop _sum; _sum = P_BAD1 + P_BAD0 ;

if _sum > 4.135903E-25 then do;

P_BAD1 = P_BAD1 / _sum;

P_BAD0 = P_BAD0 / _sum;

end;

Whereas if you specified your adjusted priors before modeling in the Input Data Source node, you won't see those calculations in the score code because the values for P_BAD1 and P_BAD0 are already taking those priors into account - the proc is doing the adjustment. But when applying this score code in the Score node, the adjustment is still being made. So running:

HMEQ(adjusted priors each 0.5) > Tree (not using priors/decisions for the split search or for pruning assessment)

should give you the same posteriors and misclassification as

HMEQ > Tree > Decisions(adjusted priors each 0.5).

Hope that helps clarify.

Wendy Czika

SAS Enterprise Miner R&D

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-22-2016 09:57 AM

Dear Wendy,

Can I do the same as above with Decesion Trees?

Best Regards,

Mohammed ElSofany

Data Scientist

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-23-2016 11:37 AM

Hi,
The answer is YES. Because the Decision node engaged to make the adjustment just consumes input probabilities. It does not care how /which model node generated the scores. In this sense, it does not have dependency on the modeling node before it. In other words, if you manually import preexisting model scores (you can do that with Model Import Node), it will run the adjustment as well.
The key difference between placing the Decision Node before and after is: placing it BEFORE has direct impact on model selection/performance. And it is also selection criterion sensitive. The question really is: if you run the adjustment AFTER the model is built, while you get the adjusted score (nominally, and 'properly'), would you really accept the model, the one without factoring in the decision in the first place?
Hope this help? Thank you for using SAS.
Jason Xin
From SAS in Boston

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-30-2016 07:25 AM

Dear Jason,

Thanks for your reply and kindly find below my feedback on your question in the end :

Since I am using the Propensity scores in the end to rank my scored base by Score and not using the scores as a absolute number, the answer will be yes.

In the mean time I will need to adjust the probabilites inorder to be able to get the real probability for the event to occur.

Do you agree with that ?

Thanks

BR

Mohammad ElSofany.