Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Looking at percentiles in noncumulative response charts when assessing data mining models.

Reply
Contributor
Posts: 41

Looking at percentiles in noncumulative response charts when assessing data mining models.

Hello everyone,

I have a little existential question about looking at lift charts when assessing models in SAS Enterprise Miner. I hope this is a simple and straightforward question.

Consider the regression model named Reg2S in the screenshot I attached below (the turquoise line). I am trying to find out at which decile (aka percentile) Reg2S is crossing the baseline. By looking at the screenshot, to the naked eye I would say it is crossing the baseline in the filth decile. However if I use the view info feature and click on the plot line at the point where it is crossing the line, the view info tooltip says it is crossing the line at the sixth decile, not at the fifth decile.

08-05-2013 23-03-47.png

So I was wondering which one is correct. Is Reg2S crossing the baseline at the fifth decile or at the sixth decile? I am quite confused about this._char

I asked my lecturer about it but I ended up being even more confused! I sent my lecture this screenshot and I was basically told that in this case the Reg2S is crossing the baseline at the fifth decile, not the sixth. As I got confused on why the view info tells it is the sith and not the fifth decile, I then sent my lecturer a screenshot from another chart where I noticed a similar behaviour. In a cumulative response chart, I took a screenshot where I clicked over a point in a plot that seemed (to the naked eye) to be in the first decile. But the view info tool was saying that it was in the second decile. This time my lecturer told me the view info was correct.

So I am extremely confused now on when to look for the lines separating the deciles in the chart, or when to look for what the view info tool tells me.

Could someone please advise?

Regards,

P.

Respected Advisor
Posts: 2,655

Re: Looking at percentiles in noncumulative response charts when assessing data mining models.

I'm a little confused.  Looking at the plot of the turqoise line, I see it crosses at about the 56th percentile.  To me that is clearly in the sixth decile, using the following mapping:

percentile range          decile

0 to 9.9999...               1

10 to 19.9999...            2

20 to 29.9999...            3

30 to 39.9999...            4

40 to 49.9999...            5

50 to 59.9999...            6

60 to 69.9999...            7

70 to 79.9999...            8

80 to 89.9999...            9

90 to 99.9999...           10

Does this eliminate (or at least reduce) your confusion?  Don't depend on the plot to count deciles, as the vertical axis is set at the beginning of decile 2.

Steve Denham

Contributor
Posts: 41

Re: Looking at percentiles in noncumulative response charts when assessing data mining models.

Hi Steve,

Thank you for trying to help me out with this.

If I understood you correctly, you are saying that the Reg2S line is then crossing the 6th decile, approx. around the 56 percentile range (50 to 59.9999 = 6 decile). Is that it?

You are then saying I should guide myself around what the view info tooltip tells me, and not by simply gazing into the plot at the naked eye. Is that so?

A bit awkward because my lecturer just told me the following:

[...] when you are interpreting the non-cumulative lift chart you are interested in seeing where the baseline is crossed, which may be midway within a decile - hence my advice to interpret from the visual representation.

Moreover, if you look at the purple line in the plot (Reg2B), it crosses the baseline just after decile 6. If I click on the like with the view info, it DOES say percentile 60 (the same as Reg2S), and not percentile 70, which I would have expected based on your mapping explanation contained in the previous post.

/me Very confused :smileyconfused: :smileyconfused: :smileyconfused: :smileyconfused:

Regards,

P.

Super User
Posts: 17,750

Re: Looking at percentiles in noncumulative response charts when assessing data mining models.

I think he's saying the tooltip is correct, your definition of 5th decile is probably incorrect.

You can look at the plot with the naked eye, as long as you have the correct definition of deciles, and interpret it correctly within context of the scale of the x axis.

Contributor
Posts: 41

Re: Looking at percentiles in noncumulative response charts when assessing data mining models.

Hi Reeza,

Thanks for that.

I do have one question however. If you look at the purple line in the plot (Reg2B), it crosses the baseline just after decile 6. If I click on the like with the view info, it DOES say percentile 60 (the same as Reg2S), and not percentile 70, which I would have expected based on Steve's mapping explanation in his previous post.

Seems to me then that the real percentiles are halfway through the lines in the plot. For example, percentile 40 is divided half to the right of the line that says percentile 40, and half to the left of the line that says percentile 40.

Is that a fair assumption?

Regards,

P.

Super User
Posts: 17,750

Re: Looking at percentiles in noncumulative response charts when assessing data mining models.

This may be a dumb question, but doesn't the tooltip depend on where on the chart you clicked? I thought that was the way it worked, and I would have expected 62 to be 70 not 60, but I don't have anything to test it on at the moment.

Contributor
Posts: 41

Re: Looking at percentiles in noncumulative response charts when assessing data mining models.

Hi Reeza,

If there are any dumb questions here, well then they are probably from me :smileygrin: I am still getting used to Enterprise Miner so there is a lot that I am learning as I get through it. Issue is that I am learning things mostly through the case study approach guide from SAS so I think it makes things harder when I don't get data by the book (i.e.: real data).

I understand where you coming from but sometimes the view info would tell me something different depending where I click on the line even if inside the same bin. Check out this screenshots. I hope you will be able to understand my confusion. Please bear with me.

First, look at the result of the yellow plot (model Reg2B, aka Reg-2)

10-05-2013 19-03-50.png

The plot line of Reg-2 is crossing the baseline slightly after the vertical line of the 60th percenticle, and the tooltip indicates that the point I clicked on is in fact on the 60th percentile. To me this makes sense. Both the chart and the info in the tooltip are in agreement.

Now, check out these two screenshots for the same model, which illustrates the reason of exact confusion:

Reg2S (aka Reg) example 1Reg2S (aka Reg) example 2
10-05-2013 19-05-05.png10-05-2013 19-04-23.png
In this example I clicked not that close to the baseline. You can see that the view info tooltip says that that point is in the 50th percentile. To me that makes sense because it seems on par with the vertical lines in the chart. That is, this point in the plot is within percentile 50 (it is kind of in the middle of the bin). So to me both the chart and the info in the tooltip are in agreement in this case as well.Now in this case clicked closer to the plot line. This time the tooltip told me this particular point is within the 60th percentile. This confuses me because the line is still inside the same bin (that is, still between the vertical/percentile lines 50 and 60). To me, the chart is saying one thing, but the tooltip is saying another thing.


So to sum up, I don't understand whether the maroon (Reg2S) line is crossing the baseline at the 50th percentile or the 60th percentile. Moreover I don't understand why I would get such variation within the same bin. I was expecting to get a value of percentile: 60 only on a point that crosses the vertical line denoted as 60 (like in the first screenshot, for RegB - the yellow plot).

My lecturer says I should ignore the info of the tooltip and look at the vertical lines of the chart. However based on the information I got from this forum and other SAS users, I am not so sure...

Thank you for the help.

Regards,

P.

Ask a Question
Discussion stats
  • 6 replies
  • 803 views
  • 6 likes
  • 3 in conversation