BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kanyange
Fluorite | Level 6

Hi ,

I have built a model in SAS entreprise Miner, but in the sample I used , the responders are over represented (1=0.0351 and 0=0.9649), so I have used the Decisions in EM and add the Adjusted Priors to correct it (1=0.0042 and 0=0.9958), please see table below. I have run the model first without adding the adjusted to see the probabilities I get, then run the model again with Adjusted Prior. But  when I score new data I get exactly  the same probabilities in both case  . ....it doesn't seem that it has corrected the prob.

Could you please help? Am I missing a step?

Many Thanks

Alice

LevelCountPriorAdjusted Prior
188650.03510.0042
02436100.96490.9958
1 ACCEPTED SOLUTION

Accepted Solutions
WendyCzika
SAS Employee

Sorry, I probably wasn't clear.  I just meant that when you use the Decisions node after a modeling node, you will see explicitly in the score code something like the following:

*** Update Posterior Probabilities;

P_BAD1 = P_BAD1 * 0.5 / 0.1994966442953;

P_BAD0 = P_BAD0 * 0.5 / 0.80050335570469;

drop _sum; _sum = P_BAD1 + P_BAD0 ;

if _sum > 4.135903E-25 then do;

   P_BAD1 = P_BAD1 / _sum;

   P_BAD0 = P_BAD0 / _sum;

end;

Whereas if you specified your adjusted priors before modeling in the Input Data Source node, you won't see those calculations in the score code because the values for P_BAD1 and P_BAD0 are already taking those priors into account - the proc is doing the adjustment.  But when applying this score code in the Score node, the adjustment is still being made. So running:

HMEQ(adjusted priors each 0.5) > Tree (not using priors/decisions for the split search or for pruning assessment)

should give you the same posteriors and misclassification as

HMEQ > Tree > Decisions(adjusted priors each 0.5).

Hope that helps clarify.

Wendy Czika

SAS Enterprise Miner R&D

View solution in original post

9 REPLIES 9
a123
Calcite | Level 5

Hi Alice,

It seems to me you are trying to use the Decisions Node to train models with an oversampled training set (to address your over represented respondents) and then get the predicted probabilities adjusted using the decision weights.

Your example looks very similar to the one described on Applied Analytics Using Enterprise Miner course notes, specifically on the Enrollment Management Case Study (Appendix A).

Try this on your Decisions node:

1. Go to the Decision Weights table.

2. For level 1 specify Decision 1 as 28.49 (1/0.0351=28.49)

3. For level 0 specify Decision 2 as 10.36 (1/.9649=10.36)

4. Leave all the other decisions for the remaining levels as 0.

From the brief explanation on the course notes, what this does is to calculate a frequency variable for each observation based on the level you specified. This frequency will weight your resulting model appropriately.

Please feel free to follow up if I am missing anything on your modeling task.

Thanks!

Miguel

Kanyange
Fluorite | Level 6

Many Thanks Miguel , will try and will let you know....

PahaKeisari
Calcite | Level 5

Hi Alice, Miguel,

I bumped into the same problem. If I understand the original question correctly, Alice, you were initially using the data source Decisions, not the "Decisions node" like Miguel answer implies?

I'm asking because at least I'm using the data source Decisions to set the priors (i have a oversampled rare case data set directly as a data source, and I have computed the adjusted prior probabilities from the original data set outside the Miner), because SAS EMiner Documentation (for 12.1) specifically states that after this priors are then automatically used to compute the posterior probabilities in the models, and that should be all that is needed.

But I have the same situation with scoring: apparently priors are not used in the scoring code. I wonder is this a bug in SAS code or in the SAS documentation?

Also I do not completely understand Miguel's answer: if the priors are the problem here, and the SAS does not utilize the adjusted priors in Scoring, why not change the priors in the Decision node after modeling instead of Decision weights?

Any help is appreciated...

Cheers,

Janne

WendyCzika
SAS Employee

What modeling node(s) are you using?  In many of the nodes, the Decision Tree node for example, the adjustment for prior probabilities is performed directly in the Enterprise Miner procedure that is run (PROC ARBOR e.g.), so you won't see the adjustment being made in the score code.  But you should see different posterior probabilities when running with adjusted priors vs. running without, for the nodes that support decisions.  I just ran simple flows with the sample data HMEQ > Decision Tree, one with adjusted priors and one without, and the posterior probabilities were different.

PahaKeisari
Calcite | Level 5

Hi Wendy,

Thanks for the answer! I'm indeed using Decision Tree also in this case, and the task is to predict the outcome probabilities of a categorical target variable (with several classes) for each of the data points .

Do I understand correctly that in your test you get the also the scoring code posterior probabilities correctly adjusted, not only the decision tree posteriors inside the DEcision tree node? In that case I must be doing something wrong here… Because I just earlier tested, in an attempt to get this working, the effect of Decisions node with adjusted priors after the modeling node (Decision Tree) and that way I managed to get the posterior probabilities correctly in the scoring also (P_* variables). To my understanding I get the posterior probabilities also correctly inside the decision tree node i.e. the modeling itself uses the adjusted priors, it is only the Score node after that seems to ignore those, unless I use Decisions node to explicitly set the adjusted priors again just before scoring.

Could it be that my usage of Score node (I'm only using it to score the same data source's test and validation set to get the optimized scoring code for copying it to Enterprise Guide, and I'm not using  a separate score data inside Miner) somehow affects the outcome here?

Many thanks for your help!

Janne

WendyCzika
SAS Employee

Sorry, I probably wasn't clear.  I just meant that when you use the Decisions node after a modeling node, you will see explicitly in the score code something like the following:

*** Update Posterior Probabilities;

P_BAD1 = P_BAD1 * 0.5 / 0.1994966442953;

P_BAD0 = P_BAD0 * 0.5 / 0.80050335570469;

drop _sum; _sum = P_BAD1 + P_BAD0 ;

if _sum > 4.135903E-25 then do;

   P_BAD1 = P_BAD1 / _sum;

   P_BAD0 = P_BAD0 / _sum;

end;

Whereas if you specified your adjusted priors before modeling in the Input Data Source node, you won't see those calculations in the score code because the values for P_BAD1 and P_BAD0 are already taking those priors into account - the proc is doing the adjustment.  But when applying this score code in the Score node, the adjustment is still being made. So running:

HMEQ(adjusted priors each 0.5) > Tree (not using priors/decisions for the split search or for pruning assessment)

should give you the same posteriors and misclassification as

HMEQ > Tree > Decisions(adjusted priors each 0.5).

Hope that helps clarify.

Wendy Czika

SAS Enterprise Miner R&D

mohammad__101
Fluorite | Level 6

Dear Wendy,

Can I do the same as above with Decesion Trees?

 

Best Regards,

Mohammed ElSofany

Data Scientist

JasonXin
SAS Employee
Hi, The answer is YES. Because the Decision node engaged to make the adjustment just consumes input probabilities. It does not care how /which model node generated the scores. In this sense, it does not have dependency on the modeling node before it. In other words, if you manually import preexisting model scores (you can do that with Model Import Node), it will run the adjustment as well. The key difference between placing the Decision Node before and after is: placing it BEFORE has direct impact on model selection/performance. And it is also selection criterion sensitive. The question really is: if you run the adjustment AFTER the model is built, while you get the adjusted score (nominally, and 'properly'), would you really accept the model, the one without factoring in the decision in the first place? Hope this help? Thank you for using SAS. Jason Xin From SAS in Boston
mohammad__101
Fluorite | Level 6

Dear Jason,

 

Thanks for your reply and kindly find below my feedback on your question in the end :

 

Since I am using the Propensity scores in the end to rank my scored base by Score and not using the scores as a absolute number, the answer will be yes.

In the mean time I will need to adjust the probabilites inorder to be able to get the real probability for the event to occur.

 

Do you agree with that ?

 

Thanks

BR

Mohammad ElSofany.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 6623 views
  • 4 likes
  • 6 in conversation