Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: The use of PEVENT= in Proc Logistic

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 09-12-2020 10:51 PM
(2682 views)

Hi,

I am training a binary classification model using Proc Logistic. The classes are imbalanced at about 10% for the event 1 and 90% for the non-event 0. I balanced the training set to about 50:50 using sampling before training. The code used is

proc logistic Data = work.train_stdize outmodel= mydata.Model_1. namelen=32; class &class_var. / param=ref; model responder(event='1') = &class_var. &num_var. / stb lackfit ctable pprob=(0.0 to 1.0 by 0.1) /* pevent=0.1 */; weight weight; score data=work.train_stdize fitstat out=mydata.train_scr outroc=mydata.troc; run;

`I included the ctable option to generate the classification table for each decile.`

ctable pprob=(0.0 to 1.0 by 0.1)

Do I need to include the pevent option? Yes or No and Why?

pevent=0.1

Thanks and much appreciated,

Lobbie

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

17 REPLIES 17

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes, PEVENT= produces the results as if the data set were still 10% in one category and 90% in the other category, even though you create a model based of 50% in each category.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes. PEVENT= would affect the prob of being Good or Bad .

By default, P above 50% is good , PEVENT= would adjust 50% according to its value .

By default, P above 50% is good , PEVENT= would adjust 50% according to its value .

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hmmm

@Ksharp wrote:

Yes. PEVENT= would affect the prob of being Good or Bad .

By default, P above 50% is good , PEVENT= would adjust 50% according to its value .

I think you are right, @Ksharp, and my response above was not correct. My response above was about the (similarly named) PRIOREVENT= option, not the PEVENT= option (I think).

So, where @Lobbie says "The classes are imbalanced at about 10% for the event 1 and 90% for the non-event 0. I balanced the training set to about 50:50 using sampling before training", I think he really wants to use the PRIOREVENT= option and not the PEVENT= option, to get proper predictions from a dataset that is 50:50, when the original data is 10:90. But again, I add **I THINK**, because I really need to read the documentation a few more times.

Or maybe someone else can jump in and straighten all this out, saving me some reading and thinking 😉

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Paige,

I think both are the same thing. You don't make a mistake .

PEVENT= Specifies prior event probabilities

I think both are the same thing. You don't make a mistake .

PEVENT= Specifies prior event probabilities

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Ok, thanks @Ksharp , but I'm still going to take some time and read the documentation carefully.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @PaigeMiller and @Ksharp ,

According to the documentation, PEVENT is for specifying prior event probabilities and is only applicable under Model statement and where CTABLE option is specified. My query was should I add PEVENT when generating the Classification Table because I trained the model on balanced classes when the proportion of my classes is 10:90. **I think the answer is "Yes, I should add PEVENT=0.1 along in the CTABLE option"?**

PRIOREVENT on the other hand is used in the Score statement according to the documentation. I found that if I fitted the model using Offset method, I will need to add PRIOREVENT=0.1 in the Score statement when scoring, so that the predicted probabilities will be adjusted with prior.

If I fitted the model using the Weight method, I do not need to use PRIOREVENT= when scoring. Reason was the adjustments are already reflected in the intercept/coefficients (@Rick_SAS mentioned this one of his replies, sorry I can't seem to find the thread now).

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Well, that's good to know, and as I said, I still have some thinking to do! Thanks!

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

No worries @PaigeMiller, and please do let me know what your thoughts are later. Really appreciate it.

My hunch is both you and @Ksharp are right in your previous answers, else why would SAS have a PEVENT option to work with the CTABLE option? Because CTABLE does not do prior adjustments by default.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello @Lobbie and @Ksharp. I don't see how PRIOREVENT= and PEVENT= produce the same results.

```
/* Make up some data, with 10% value of 0 and 90% value of 1 */
data a;
do i=1 to 1000;
if i<=100 then y=0;
else y=1;
x1=rand('normal');
x2=rand('normal');
output;
end;
run;
/* Perform Logistic Regression */
proc logistic data=a;
model y(event='1')=x1 x2;
output out=preds predicted=pred;
run;
/* Oversample to 50-50, PEVENT=0.9 */
proc logistic data=a(where=(i<=200));
model y(event='1')=x1 x2/pevent=0.9 ctable;
output out=preds2 predicted=pred2;
run;
/* Oversample to 50-50, PRIOREVENT=0.9 */
proc logistic data=a(where=(i<=200));
model y(event='1')=x1 x2;
score out=preds3 priorevent=0.9;
run;
```

PRIOREVENT does what I think should be done given my understanding of the original problem. It's not clear to me how PEVENT applies here.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Paige,

Very interesting . I also want know .

It seem that PEVENT= has nothing to do with predicted probability .It is just for Class Table(CTable).

According to documentation:

PEVENT=value| (list)

specifies one prior probability or a list of prior probabilities for the event of interest. The false positive

and false negative rates are then computed as posterior probabilities by Bayes’ theorem.

That means OP do not need PEVENT= . Only priorevent=0.9 could adjust predicted probability .

I mislead OP, I was wrong.

@StatDave @Rick_SAS @SteveDenham could take a look ?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Tags:
- oversampling

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@StatDave wrote:

I think the red highlighted text clears things up for me. Thanks!

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@StatDave , yes I am following the 22601 note and am using Weight when fitting my model. The "Details: Classification Table" section of the LOGISTIC documentation stated that PEVENT should be used because I have fitted the model using a balanced training set when it was 10:90 in the beginning.

However when I did not specify the PEVENT option during the creation of CTABLE during model training, and I manually calculate PPV & NPV using the scored data, the results matched.

The only explanation I can think of i.e. as to why I do not need to specify PEVENT option contrary to the recommendation in the documentation is because I fitted the model using Weight statement. All parameters are adjusted accordingly and are used to compute the CTABLE and P_1 probabilities in the scored dataset. This is also the reason why I do not need to specify PRIOREVENT= in the score statement when scoring.

Am I correct? Thanks.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.