<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: logististic regression- work on train data or all data?? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953556#M372510</link>
    <description>&lt;P&gt;You have to identify the training data in data set PANEL, for example (and there are many ways to do this), let suppose the variable TRAIN has 1 if it is the training data set and has the value 0 otherwise. Then&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc logistic data=panel(where=(train=1)) namelen=60 descending;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Fri, 13 Dec 2024 22:10:26 GMT</pubDate>
    <dc:creator>PaigeMiller</dc:creator>
    <dc:date>2024-12-13T22:10:26Z</dc:date>
    <item>
      <title>logististic regression- work on train data or all data??</title>
      <link>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953555#M372509</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;
&lt;P&gt;I want to run logistic regression to build a credit risk model.&lt;/P&gt;
&lt;P&gt;The data included 100,000 rows and has indicator variable that&amp;nbsp; tell if the observation being to train data (in-sample) or test data (out-sample).&lt;/P&gt;
&lt;P&gt;My question-&lt;/P&gt;
&lt;P&gt;In order to build the model we need to work only on train data (in-sample).&lt;/P&gt;
&lt;P&gt;My question- Why are we using the whole data set (called&amp;nbsp;&lt;CODE class=" language-sas"&gt;panel in&amp;nbsp;my&amp;nbsp;code)&amp;nbsp;that&amp;nbsp;included&amp;nbsp;both&amp;nbsp;train+Test&amp;nbsp;data?&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&lt;CODE class=" language-sas"&gt;How&amp;nbsp;SAS&amp;nbsp;knows&amp;nbsp;to&amp;nbsp;build&amp;nbsp;the&amp;nbsp;model&amp;nbsp;only&amp;nbsp;on&amp;nbsp;the&amp;nbsp;train&amp;nbsp;data&amp;nbsp;(If&amp;nbsp;the&amp;nbsp;data&amp;nbsp;set&amp;nbsp;wrote&amp;nbsp;is&amp;nbsp;train+test&amp;nbsp;data)??&lt;/CODE&gt;&lt;/P&gt;
&lt;P&gt;&lt;CODE class=" language-sas"&gt;&lt;/CODE&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc genmod data=panel namelen=60 descending ;
ods output parameterestimates=want;
class  X W Z R;
model TARGET=X W Z R/ dist=binomial link=logit  type3 wald ;
output out=want
p=P_BAD xbeta=logit;
ODS SELECT ModelANOVA;  
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2024 22:02:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953555#M372509</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2024-12-13T22:02:38Z</dc:date>
    </item>
    <item>
      <title>Re: logististic regression- work on train data or all data??</title>
      <link>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953556#M372510</link>
      <description>&lt;P&gt;You have to identify the training data in data set PANEL, for example (and there are many ways to do this), let suppose the variable TRAIN has 1 if it is the training data set and has the value 0 otherwise. Then&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc logistic data=panel(where=(train=1)) namelen=60 descending;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 13 Dec 2024 22:10:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953556#M372510</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-12-13T22:10:26Z</dc:date>
    </item>
    <item>
      <title>Re: logististic regression- work on train data or all data??</title>
      <link>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953558#M372512</link>
      <description>&lt;P&gt;You can simultaneously fit the model to the training portion of your data and evaluate the fitted model on both the training and test portions using the PARTITION statement in PROC HPLOGISTIC. The following is a simplified version of the example titled "" in the HPLOGISTIC documentation in the SAS/STAT User's Guide (&lt;A href="https://support.sas.com/en/software/sas-stat-support.html" target="_blank"&gt;https://support.sas.com/en/software/sas-stat-support.html&lt;/A&gt;&amp;nbsp;). The ROLEVAR option lets you specify the variable in your data set that distinguishes the training and test portions. The output will show you fit statistics for both portions. Note also that instead of using the DESCENDING option, it is safer for you to always use the EVENT= option (either in the LOGISTIC, HPLOGISTIC, or GENMOD procedure) to be sure that you are modeling the level of the response variable that you consider the event level of interest.&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc hplogistic data=Sashelp.JunkMail;
   model Class(event='1')=Make Address All _3d Our Over Remove Internet Order;
   partition rolevar=Test(train='0' test='1');
run;

&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 13 Dec 2024 22:50:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953558#M372512</guid>
      <dc:creator>StatDave</dc:creator>
      <dc:date>2024-12-13T22:50:01Z</dc:date>
    </item>
    <item>
      <title>Re: logististic regression- work on train data or all data??</title>
      <link>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953559#M372513</link>
      <description>&lt;P&gt;So If I run &amp;nbsp;proc genmod&amp;nbsp; and I want to build the model only on train data then I should add the condition&amp;nbsp; (Where=(Train=1))??&lt;/P&gt;
&lt;P&gt;(Train variable is indicator if observation belong to in-sample or out-sample)&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2024 22:54:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953559#M372513</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2024-12-13T22:54:09Z</dc:date>
    </item>
    <item>
      <title>Re: logististic regression- work on train data or all data??</title>
      <link>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953560#M372514</link>
      <description>&lt;P&gt;That's exactly what I said.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2024 23:05:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953560#M372514</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-12-13T23:05:59Z</dc:date>
    </item>
    <item>
      <title>Re: logististic regression- work on train data or all data??</title>
      <link>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953561#M372515</link>
      <description>&lt;P&gt;I want to foucs on proc genmond please.&lt;/P&gt;
&lt;P&gt;Let's say that the variable outsample get values 1 or 0 (1&amp;nbsp; is test data, 0 is train data).&lt;/P&gt;
&lt;P&gt;I want to calculate model coefficients based on train data only.&lt;/P&gt;
&lt;P&gt;I want to calculate P_bad for all population (Train+Test data)&lt;/P&gt;
&lt;P&gt;It was told me in my work that when I run the code below then the coefficients are calculated on train data only.&lt;/P&gt;
&lt;P&gt;As you can see in the code I dont see anything related to outsample=0.&lt;/P&gt;
&lt;P&gt;Can you please tell how the model is calculated in this code?(Based on train data only or train+test?)&lt;/P&gt;
&lt;P&gt;I checked it and it is true!&amp;nbsp; in the code below sas calculate the model based on train data only.&lt;/P&gt;
&lt;P&gt;MY question is - How does SAS knows to calculate it only on train data??&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc genmod data=panel namelen=60 descending ;
ods output parameterestimates=want;
class  X W Z R;
model TARGET=X W Z R/ dist=binomial link=logit  type3 wald ;
output out=want
p=P_BAD xbeta=logit;
ODS SELECT ModelANOVA;  
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2024 23:09:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953561#M372515</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2024-12-13T23:09:53Z</dc:date>
    </item>
    <item>
      <title>Re: logististic regression- work on train data or all data??</title>
      <link>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953562#M372516</link>
      <description>&lt;P&gt;The var mention if train or test data is called outsample&lt;/P&gt;
&lt;P&gt;These 2 codes provide same coefficients.&lt;/P&gt;
&lt;P&gt;Why??&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc genmod data=panel  namelen=60 descending ;
ods output parameterestimates=tbl_coef;
class  X W Z t;
model TARGET=X W Z t / dist=binomial link=logit  type3 wald ;
output out=tbl_Want1   
p=P_BAD xbeta=logit;
ODS SELECT ModelANOVA; 
run;


proc genmod data=panel(Where=(outsample=0))  namelen=60 descending ;
ods output parameterestimates=tbl_coef;
class  X W Z t;
model TARGET=X W Z t / dist=binomial link=logit  type3 wald ;
output out=tbl_Want1   
p=P_BAD xbeta=logit;
ODS SELECT ModelANOVA; 
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 13 Dec 2024 23:14:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953562#M372516</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2024-12-13T23:14:35Z</dc:date>
    </item>
    <item>
      <title>Re: logististic regression- work on train data or all data??</title>
      <link>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953563#M372517</link>
      <description>&lt;P&gt;Using PROC LOGISTIC, see the example named "ROC analysis using separate training and validation data sets" here&amp;nbsp;&lt;A href="https://support.sas.com/kb/39/724.html" target="_blank" rel="noopener"&gt;https://support.sas.com/kb/39/724.html&lt;/A&gt;&amp;nbsp; So LOGISTIC does exactly what you want.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This method does not work in PROC GENMOD, and its not clear to me how to do this with PROC GENMOD alone. Probably you will need PROC GENMOD + PROC PLM&lt;/P&gt;</description>
      <pubDate>Sat, 14 Dec 2024 00:00:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953563#M372517</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-12-14T00:00:15Z</dc:date>
    </item>
    <item>
      <title>Re: logististic regression- work on train data or all data??</title>
      <link>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953564#M372518</link>
      <description>&lt;P&gt;ALL of the variable Outsample =0 apparently of the observations used.&lt;/P&gt;
&lt;P&gt;What does the log show? Typically with a data set WHERE there will be a note about how many observations meet the condition. And how many total observations were used by the model.&lt;/P&gt;</description>
      <pubDate>Fri, 13 Dec 2024 23:23:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953564#M372518</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-12-13T23:23:36Z</dc:date>
    </item>
    <item>
      <title>Re: logististic regression- work on train data or all data??</title>
      <link>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953569#M372520</link>
      <description>"It was told me in my work that when I run the code below then the coefficients are calculated on train data only.&lt;BR /&gt;As you can see in the code I dont see anything related to outsample=0."&lt;BR /&gt;That is not ture. Since dataset "PANEL" contains all data and in your code there is not outsample=0, your code is just building a model based on ALL data ,not TRAIN data.&lt;BR /&gt;You should ask this question to your mentor.&lt;BR /&gt;&lt;BR /&gt;Your mentor run this code, I think it is just to get a CUTOFF value or  get a BEST p-value to yield the Yhat=0 or Yhat=1.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;"MY question is - How does SAS knows to calculate it only on train data??"&lt;BR /&gt;SAS didn't know.Your code is just on ALL(train+test) data.&lt;BR /&gt;Except you are using SAS/EM and assign the variable 'outsample' to role 'train' and 'test'.</description>
      <pubDate>Sat, 14 Dec 2024 06:24:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953569#M372520</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-12-14T06:24:33Z</dc:date>
    </item>
    <item>
      <title>Re: logististic regression- work on train data or all data??</title>
      <link>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953570#M372521</link>
      <description>&lt;P&gt;I found the answer.&lt;/P&gt;
&lt;P&gt;The task is to calculate the model coefficients based on train data only and calculate prediction values (Yhat) on both train+Test data.&lt;/P&gt;
&lt;P&gt;One way to do it is create a binary varaible (for example called weight ) that get value 1(Train) and 0 (Test)&lt;/P&gt;
&lt;P&gt;and then use weight statement in Proc genmod&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Other way (Tricky) is to create another response variable (for example :_Y_) that get null values for test data .&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data train(KEEP=X Y);
Do i=1 to 50;
x=ranuni(1234579);
p = 0.6;
u = rand("Uniform");
if (u &amp;lt; p) then Y=1;else Y=0;
output;
end;
Run;
Data test(KEEP=X Y);
Do i=1 to 25;
x=ranuni(1234579);
p = 0.6;
u = rand("Uniform");
if (u &amp;lt; p) then Y=1;else Y=0;
output;
end;
Run;


/***Way1- Calculate coefficents based on train data only. Calculate pred for Train+Test*****/
/**Using Weight statement and using Weight varaible that identify train/test data***/
data panel;
set train(in=a) test(in=b);
if a then weight=1;else weight=0;/***Value 1 for train data, value 0 for test data**/
Run;
proc genmod data=panel;
ods output parameterestimates=tbl_coefficients;/****Data set with Coeficients information that was calculated from train data only****/
weight weight;
model y=x / dist=binomial link=logit  type3 wald ;
output out=preds_tbl pred=P_BAD XBETA=logit;
Run;



/***Way2- Calculate coefficents based on train data only. Calculate pred for Train+Test*****/
/**Trick---Create null values for te response varaible***/
data panel_b;
set train(in=a) test(in=b);
if a then _Y_=Y;else _Y_=.;
Run;
proc genmod data=panel_b;
ods output parameterestimates=tbl_coefficients;/****Data set with Coeficients information that was calculated from train data only****/
model _Y_=x / dist=binomial link=logit  type3 wald ;
output out=preds_tbl pred=P_BAD XBETA=logit;
Run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 14 Dec 2024 06:39:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/logististic-regression-work-on-train-data-or-all-data/m-p/953570#M372521</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2024-12-14T06:39:00Z</dc:date>
    </item>
  </channel>
</rss>

