<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Logistic regression cstatistic of Validation set is larger than Train set in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/601149#M16680</link>
    <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;for the reply.&lt;/P&gt;&lt;P&gt;Can we understand whether the model is overfitting or underfitting from Goodness of Fit statistic like H-L test. I went through the materials but couldn't figure out. Could you please help me&lt;/P&gt;</description>
    <pubDate>Sat, 02 Nov 2019 15:48:32 GMT</pubDate>
    <dc:creator>aranganayagi</dc:creator>
    <dc:date>2019-11-02T15:48:32Z</dc:date>
    <item>
      <title>Logistic regression cstatistic of Validation set is larger than Train set</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/600661#M16625</link>
      <description>&lt;P&gt;Hi, I am very new to SAS Stats and running Logistic regression.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am looking to get answer for two questions&lt;/P&gt;&lt;P&gt;1. I am getting "Model Convergence status is Quasi complete Separation of data Point detected". what would be the implication of this warning and how to solve this.&lt;/P&gt;&lt;P&gt;2. C-statistics of validation data set is larger than C-statistics of Training set. Is this possible?&amp;nbsp;&lt;/P&gt;&lt;P&gt;My expectation is Training set should perform better than validation data set.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Attached the report for your reference.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could you please help to get answer for the 2 questions.&lt;/P&gt;</description>
      <pubDate>Thu, 31 Oct 2019 11:23:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/600661#M16625</guid>
      <dc:creator>aranganayagi</dc:creator>
      <dc:date>2019-10-31T11:23:01Z</dc:date>
    </item>
    <item>
      <title>Re: Logistic regression cstatistic of Validation set is larger than Train set</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/600671#M16626</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/59319"&gt;@aranganayagi&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hi, I am very new to SAS Stats and running Logistic regression.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am looking to get answer for two questions&lt;/P&gt;
&lt;P&gt;1. I am getting "Model Convergence status is Quasi complete Separation of data Point detected". what would be the implication of this warning and how to solve this.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;A href="https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-complete-or-quasi-complete-separation-in-logisticprobit-regression-and-how-do-we-deal-with-them/" target="_blank"&gt;https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-complete-or-quasi-complete-separation-in-logisticprobit-regression-and-how-do-we-deal-with-them/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;2. C-statistics of validation data set is larger than C-statistics of Training set. Is this possible?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My expectation is Training set should perform better than validation data set.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes, it is possible, if there is just random noise as the difference between training and validation, there's no reason that the training set has to perform better, randomly the model might fit the validation better.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 31 Oct 2019 12:25:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/600671#M16626</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-10-31T12:25:55Z</dc:date>
    </item>
    <item>
      <title>Re: Logistic regression cstatistic of Validation set is larger than Train set</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/600672#M16627</link>
      <description>&lt;P&gt;Question 1:&lt;/P&gt;
&lt;P&gt;You have sparse data for category variable .&lt;/P&gt;
&lt;P&gt;Example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Y&amp;nbsp; &amp;nbsp;RACE&lt;/P&gt;
&lt;P&gt;1&amp;nbsp; &amp;nbsp; &amp;nbsp;white&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;1&amp;nbsp; &amp;nbsp;&amp;nbsp;white&lt;/P&gt;
&lt;P&gt;0&amp;nbsp; &amp;nbsp;&amp;nbsp;white&lt;/P&gt;
&lt;P&gt;1&amp;nbsp; &amp;nbsp; black&lt;/P&gt;
&lt;P&gt;1&amp;nbsp; &amp;nbsp; black&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could see white have both 1 and 0 , but black have only 1 .&lt;/P&gt;
&lt;P&gt;you could remove this kind of variable .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Question 2:&lt;/P&gt;
&lt;P&gt;Yes. Anything is possible .&lt;/P&gt;
&lt;P&gt;Since your Train and Validate data are random sample ,anything would happen .especially the size of Validate data is smaller than Train data . (Smaller size data tend to get higher C statistic)&lt;/P&gt;</description>
      <pubDate>Thu, 31 Oct 2019 12:27:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/600672#M16627</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-10-31T12:27:38Z</dc:date>
    </item>
    <item>
      <title>Re: Logistic regression cstatistic of Validation set is larger than Train set</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/600674#M16628</link>
      <description>Thanks Page Miller and Ksharp for the reply. It is very helpful.&lt;BR /&gt;&lt;BR /&gt;Again I get 2 more questions.&lt;BR /&gt;1.Based on ROC curve and C statistics of train and validation set, can we determine the model is performing better.&lt;BR /&gt;&lt;BR /&gt;2. Is it necessary that the model should converge. ( I mean, should we fix Quasi complete separation warning). If we dont fix, what would be the implication.</description>
      <pubDate>Thu, 31 Oct 2019 12:35:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/600674#M16628</guid>
      <dc:creator>aranganayagi</dc:creator>
      <dc:date>2019-10-31T12:35:16Z</dc:date>
    </item>
    <item>
      <title>Re: Logistic regression cstatistic of Validation set is larger than Train set</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/600684#M16629</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/59319"&gt;@aranganayagi&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Thanks Page Miller and Ksharp for the reply. It is very helpful.&lt;BR /&gt;&lt;BR /&gt;Again I get 2 more questions.&lt;BR /&gt;1.Based on ROC curve and C statistics of train and validation set, can we determine the model is performing better.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Better than what?&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;2. Is it necessary that the model should converge. ( I mean, should we fix Quasi complete separation warning). If we dont fix, what would be the implication.&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The link I provided explains what to do in the presence of quasi-complete separation.&lt;/P&gt;</description>
      <pubDate>Thu, 31 Oct 2019 13:36:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/600684#M16629</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2019-10-31T13:36:37Z</dc:date>
    </item>
    <item>
      <title>Re: Logistic regression cstatistic of Validation set is larger than Train set</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/600932#M16660</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;1.Based on ROC curve and C statistics of train and validation set, can we determine the model is performing better.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;I would not trust ROC or C statisitic , I prefer to Goodness Of Fit statistic like H-L test .&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt;&amp;nbsp; has written several blog about it .&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;2. Is it necessary that the model should converge. ( I mean, should we fix Quasi complete separation warning). If we dont fix, what would be the implication.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Yes . I thinks so . If model is not converge , the output is not trust.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Or&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt;&amp;nbsp; might have some word to say.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Nov 2019 11:41:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/600932#M16660</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-11-01T11:41:05Z</dc:date>
    </item>
    <item>
      <title>Re: Logistic regression cstatistic of Validation set is larger than Train set</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/601149#M16680</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;for the reply.&lt;/P&gt;&lt;P&gt;Can we understand whether the model is overfitting or underfitting from Goodness of Fit statistic like H-L test. I went through the materials but couldn't figure out. Could you please help me&lt;/P&gt;</description>
      <pubDate>Sat, 02 Nov 2019 15:48:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/601149#M16680</guid>
      <dc:creator>aranganayagi</dc:creator>
      <dc:date>2019-11-02T15:48:32Z</dc:date>
    </item>
    <item>
      <title>Re: Logistic regression cstatistic of Validation set is larger than Train set</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/601218#M16692</link>
      <description>&lt;P&gt;No. GOF only can tell you if the model fit the sample data(train dataset) well or not .&lt;/P&gt;
&lt;P&gt;If you have good GOF statistics ,it usually hint model is NOT overfit and is NOT lackfit .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have sas 9.4 m6 , You could try&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc logistic ....&lt;/P&gt;
&lt;P&gt;model ........ / GOF ;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;if not try&amp;nbsp;&lt;/P&gt;
&lt;P&gt;model ......../ LACKFIT ;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Another GOF is check if the model is overdisperse :&lt;/P&gt;
&lt;P&gt;model ............/ scale=none aggregate ;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Search sas logistic 's documentation or Rick's blog you could find it .&lt;/P&gt;</description>
      <pubDate>Sun, 03 Nov 2019 11:21:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Logistic-regression-cstatistic-of-Validation-set-is-larger-than/m-p/601218#M16692</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-11-03T11:21:22Z</dc:date>
    </item>
  </channel>
</rss>

