<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SCORING DATA WITH LOGIT MODEL in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/SCORING-DATA-WITH-LOGIT-MODEL/m-p/72329#M20954</link>
    <description>Thanks! This is the answer I need. I have a followup inquiry:&lt;BR /&gt;
&lt;BR /&gt;
The purpose of my model is to assign probabilities to 'patrons' that use our service. I want to concentrate our efforts on those that the model predicts to be 'in the middle of the road.' They in theory would be most sensitive to promotions and interventions.  &lt;BR /&gt;
&lt;BR /&gt;
I'm getting a significant HL  chi square at the developmental stage (implying poor fit), but have c = 74%,  correct predictions at 64%. ( with similar results from scored data for 2008) Of course my max rescaled R-square for my model is only .20. Most all individual predictors are significant at .001 or .05 with expected signs. &lt;BR /&gt;
&lt;BR /&gt;
Can you, or someone give me their opinion about how much importance I should attribute to the HL test with regards to the utility of my model? Most of the papers that I have reviewed don't even report HL results. In grad school the log likelihood  based diagnostics and % of correct predictions were the main emphasis in my courses. &lt;BR /&gt;
&lt;BR /&gt;
 Greene  confuses me even more with his remarks  regarding  maximum likelihood estimators: &lt;BR /&gt;
&lt;BR /&gt;
It remains an interesting question for research whether fitting y well or obtaining good parameter estimates is a preferable estimation criterion. Evidently, they need not be the same thing.’ ( p. 686 Greene,  Econometric Analysis 5th ed)&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
Thanks.</description>
    <pubDate>Wed, 04 Feb 2009 14:57:39 GMT</pubDate>
    <dc:creator>deleted_user</dc:creator>
    <dc:date>2009-02-04T14:57:39Z</dc:date>
    <item>
      <title>SCORING DATA WITH LOGIT MODEL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SCORING-DATA-WITH-LOGIT-MODEL/m-p/72327#M20952</link>
      <description>How can I get the Hosmer &amp;amp;Lemesahaw Goodness of Fit test to work on scored data, ( i.e. I want to score new data using a previously fitted model and get this test)&lt;BR /&gt;
&lt;BR /&gt;
I'm using proc logistic to predict the probability y =1 (as an example) &lt;BR /&gt;
When I develop the model using 5 years of data I test the model against the developmental data and ask for an ROC curve plot, area, % of correct predictions ( from a classification table) and the Hosmer and Lemeshaw Goodness of fit test.         &lt;BR /&gt;
&lt;BR /&gt;
I also want to score a 'validation' data set ( for this year's data). To do this ( by guessing) I ad a line of code:&lt;BR /&gt;
SCORE  DATA = PROJECTS.DATA_08  OUT= SCORE2 OUTROC = ROC_DATA FITSTAT;&lt;BR /&gt;
&lt;BR /&gt;
This appears to give me the ROC plot and Fit Statistics for the 2008 data, but of course, I can't figure out how to get the Hosmer &amp;amp; Lemeshaw test on the 'scored' developmental data. ( I can't find a way to add a 'LACKFIT' option anywhere as I did for the developmental data set in the MODEL statment. &lt;BR /&gt;
&lt;BR /&gt;
Here is the code I'm using in whole: &lt;BR /&gt;
&lt;BR /&gt;
ODS GRAPHICS ON;&lt;BR /&gt;
ODS HTML;&lt;BR /&gt;
PROC LOGISTIC PLOTS = ROC DATA = PROJECTS.DATA_02_07   &lt;BR /&gt;
&lt;BR /&gt;
CLASS &lt;BR /&gt;
X1 X2 X3 / &lt;BR /&gt;
PARAM =GLM;&lt;BR /&gt;
&lt;BR /&gt;
MODEL Y (EVENT ='1') =  &lt;BR /&gt;
&lt;BR /&gt;
X1 X2 X3 X4 X5 /LACKFIT RSQ TECHNIQUE =NEWTON PPROB =.50 CTABLE &lt;BR /&gt;
&lt;BR /&gt;
SCORE OUT = SCORE1  FITSTAT;&lt;BR /&gt;
SCORE  DATA = PROJECTS.ENRL_08  OUT= SCORE2 OUTROC = ROC_DATA FITSTAT;&lt;BR /&gt;
/*THE ABOVE LINE  APPEARS TO SCORE DESIGNATED DATA SET&lt;BR /&gt;
GIVEN THE MODEL JUST DEVELOPED &amp;amp;  GIVES A ROC PLOT, FITSTATS, NO HL*/&lt;BR /&gt;
&lt;BR /&gt;
OUTPUT OUT=PRED RESDEV =RESDEV RESCHI =RESCHI H = HAT P = PHAT&lt;BR /&gt;
LOWER =LCL UPPER = UCL PRED = PRED PREDPROB=(INDIVIDUAL CROSSVALIDATE)&lt;BR /&gt;
PREDICTED=FV; &lt;BR /&gt;
&lt;BR /&gt;
RUN;&lt;BR /&gt;
&lt;BR /&gt;
Any suggestions?</description>
      <pubDate>Tue, 03 Feb 2009 22:43:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SCORING-DATA-WITH-LOGIT-MODEL/m-p/72327#M20952</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2009-02-03T22:43:06Z</dc:date>
    </item>
    <item>
      <title>Re: SCORING DATA WITH LOGIT MODEL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SCORING-DATA-WITH-LOGIT-MODEL/m-p/72328#M20953</link>
      <description>Because the Hosmer &amp;amp; Lemeshow (HL) test is a summary statistic, it can't be part of the score dataset.  You have to read the score dataset and compute the HL stat manually (the formula is in the documentation).&lt;BR /&gt;
&lt;BR /&gt;
First run PROC RANK on the predicted probabilities to get the desired number of groups.  Then sum the predicted probabilities in each group (that is the expected N) and count the number of events in that group (the observed N) and use the formula to compute the chi-square.&lt;BR /&gt;
&lt;BR /&gt;
The HL test doesn't have very good power, so not rejecting does not say acceptable fit.  A more sensitive, but subjective, measure is to plot the mean predicted probability of the event for each group on the x-axis and the observed proportion of events on the y-axis.  A good fit is a straight line.  Unfortunately, you can't put a number to it like you can interpret the c-statistic as the area under the ROC, you just have to look at a lot of data....&lt;BR /&gt;
&lt;BR /&gt;
Doc Muhlbaier&lt;BR /&gt;
Duke</description>
      <pubDate>Wed, 04 Feb 2009 02:08:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SCORING-DATA-WITH-LOGIT-MODEL/m-p/72328#M20953</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2009-02-04T02:08:52Z</dc:date>
    </item>
    <item>
      <title>Re: SCORING DATA WITH LOGIT MODEL</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SCORING-DATA-WITH-LOGIT-MODEL/m-p/72329#M20954</link>
      <description>Thanks! This is the answer I need. I have a followup inquiry:&lt;BR /&gt;
&lt;BR /&gt;
The purpose of my model is to assign probabilities to 'patrons' that use our service. I want to concentrate our efforts on those that the model predicts to be 'in the middle of the road.' They in theory would be most sensitive to promotions and interventions.  &lt;BR /&gt;
&lt;BR /&gt;
I'm getting a significant HL  chi square at the developmental stage (implying poor fit), but have c = 74%,  correct predictions at 64%. ( with similar results from scored data for 2008) Of course my max rescaled R-square for my model is only .20. Most all individual predictors are significant at .001 or .05 with expected signs. &lt;BR /&gt;
&lt;BR /&gt;
Can you, or someone give me their opinion about how much importance I should attribute to the HL test with regards to the utility of my model? Most of the papers that I have reviewed don't even report HL results. In grad school the log likelihood  based diagnostics and % of correct predictions were the main emphasis in my courses. &lt;BR /&gt;
&lt;BR /&gt;
 Greene  confuses me even more with his remarks  regarding  maximum likelihood estimators: &lt;BR /&gt;
&lt;BR /&gt;
It remains an interesting question for research whether fitting y well or obtaining good parameter estimates is a preferable estimation criterion. Evidently, they need not be the same thing.’ ( p. 686 Greene,  Econometric Analysis 5th ed)&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
Thanks.</description>
      <pubDate>Wed, 04 Feb 2009 14:57:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SCORING-DATA-WITH-LOGIT-MODEL/m-p/72329#M20954</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2009-02-04T14:57:39Z</dc:date>
    </item>
  </channel>
</rss>

