turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- ROC curve for hold out sample

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-10-2017 08:10 AM

Hi all,

I want to minimize the euclidean distance between the point (0,1) and my ROC curve. I have trained my logistic model in the set train2007 and want to test the model on the set pred2008. I have tried this code:

proc logistic data=sasdata.train2007;

model flag(event="1") = TL_TA EAT_TA / CTABLE outroc=troc;

score data=sasdata.pred2008 out=valpred outroc=vroc;

roc; roccontrast;

run;

the thing is, CTABLE only gives me the misclassification for the train2007 dataset. I want to find the misclassification for the pred2008 dataset and find the optimal cut off point by minimizing the euclidean distance in that dataset.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yocrachi

05-10-2017 09:47 AM

Since you can get ROC table , you can get that cut off point ( make _FALPOS_ and _FALNEG_ as small as it could)

Obs | _PROB_ | _POS_ | _NEG_ | _FALPOS_ | _FALNEG_ | _SENSIT_ | _1MSPEC_ |
---|---|---|---|---|---|---|---|

1 | 0.97674 | 1 | 10 | 0 | 8 | 0.11111 | 0.0 |

2 | 0.88438 | 2 | 10 | 0 | 7 | 0.22222 | 0.0 |

3 | 0.86057 | 3 | 10 | 0 | 6 | 0.33333 | 0.0 |

4 | 0.77359 | 4 | 10 | 0 | 5 | 0.44444 | 0.0 |

5 | 0.75478 | 4 | 9 | 1 | 5 | 0.44444 | 0.1 |

6 | 0.70552 | 5 | 9 | 1 | 4 | 0.55556 | 0.1 |

7 | 0.59623 | 6 | 9 | 1 | 3 | 0.66667 | 0.1 |

8 | 0.58251 | 7 | 8 | 2 | 2 | 0.77778 | 0.2 |

9 | 0.57043 | 7 | 7 | 3 | 2 | 0.77778 | 0.3 |

10 | 0.56984 | 7 | 6 | 4 | 2 | 0.77778 | 0.4 |

11 | 0.42628 | 8 | 6 | 4 | 1 | 0.88889 | 0.4 |

12 | 0.23915 | 9 | 6 | 4 | 0 | 1.00000 | 0.4 |

13 | 0.14442 | 9 | 5 | 5 | 0 | 1.00000 | 0.5 |

14 | 0.13271 | 9 | 4 | 6 | 0 | 1.00000 | 0.6 |

15 | 0.10293 | 9 | 3 | 7 | 0 | 1.00000 | 0.7 |

16 | 0.07392 | 9 | 2 | 8 | 0 | 1.00000 | 0.8 |

17 | 0.02239 | 9 | 1 | 9 | 0 | 1.00000 | 0.9 |

18 | 0.00109 | 9 | 0 | 10 | 0 | 1.00000 | 1.0 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

05-10-2017 10:16 AM

Thank you. But the ctable gives me the sensitivity and specifity for train2007. But i guess you mean that i should do that for the vroc file, right?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yocrachi

05-10-2017 10:31 AM - edited 05-10-2017 11:14 AM

And one more thing. My first observation has the lowest mis classifikation, it has only false negatives of 1000. That means that the optimal cuttoff would be in (0,0). However, my ROC curve looks like the attached document. We can see from the curve that thats now where the least euclidean from (0,1) to the curve is not (0,0). I thought the minimum euclidean distance would be where the minimum misclassification of both false positives and false negatives is? And from the ROC curve, thats not the point (0,0)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yocrachi

05-11-2017 08:10 AM

OUTROC table also could give you sensitity and 1-specitify

I don't understand your euclidean distance here. You mean the point (slope=1)?

Obs | _PROB_ | _POS_ | _NEG_ | _FALPOS_ | _FALNEG_ | _SENSIT_ | _1MSPEC_ |
---|

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

05-11-2017 08:35 AM

Actually, i think I'm just misunderstanding exactly what the minimum euclidean distance from (0,1) to the ROC curve is saying. The articles say it's a way of finding an optimal cut off point. I though the minimum euclidean distance cutoff point gave me where both false negatives and false positves is at a minimum, but thats not what the minimum euclidean distance gives me; i just want to know what it is it gives me then?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to yocrachi

05-11-2017 08:50 AM - edited 05-11-2017 09:09 AM

I think you mean the point which has slope=1.

You can get it by calculated slope .

slope=(y2-y1)/(x2-x1)

Here y is sensitity, x is 1-specitity.

you can calculated these slope by the two obs next to each other.

when you see slope > 1 and after slope < 1, then you can get slope=1 point.