<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Is it statistically sound to include a form of the dependent variable as an independent variable in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891491#M44184</link>
    <description>&lt;P&gt;For your&amp;nbsp;&lt;SPAN&gt;Fahrenheit example, basically what I am asking is, what is the statistical issue with using the average of August 2022 temperature to predict August 29, 2023's temperature?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 29 Aug 2023 14:05:30 GMT</pubDate>
    <dc:creator>smithcl13</dc:creator>
    <dc:date>2023-08-29T14:05:30Z</dc:date>
    <item>
      <title>Is it statistically sound to include a form of the dependent variable as an independent variable?</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891379#M44178</link>
      <description>&lt;P&gt;This is my first time posting to the SAS community so thank you all for your help!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a more statistical theory question. In a linear regression model, would it be statistically sound to use the average of the dependent as an independent variable?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Specifically, I am building a model to predict the number of days until project completion. I am interested in creating a variable named “average days” which is the average amount of time to complete a project by zip code. For the aggregation, a given data point’s project time would not contribute to the average for that particular observation, but would be used for other data points in the same zip code. The new variable “average days” is not strongly correlated with other variables in the model (strongest correlation is 0.4 with one other variable that determines the likelihood that a permit will be needed for the job). I attached a sample of the code if interested.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Assuming there is no multicollinearity and/or model overfitting, would there be statistical or mathematical concerns with this method? If so, could you explain why?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Aug 2023 20:35:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891379#M44178</guid>
      <dc:creator>smithcl13</dc:creator>
      <dc:date>2023-08-28T20:35:30Z</dc:date>
    </item>
    <item>
      <title>Re: Is it statistically sound to include a form of the dependent variable as an independent variable</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891390#M44179</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;In a linear regression model, would it be statistically sound to use the average of the dependent as an independent variable?&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;In general, no. You can't play with words and make some function of the dependent variable to be an independent variable. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Perhaps instead of doing what you described, maybe zip code could be a categorical predictor, or some other measure of the zip code such as average income or average age or something that you deem is relevant.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Aug 2023 19:25:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891390#M44179</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2023-08-28T19:25:05Z</dc:date>
    </item>
    <item>
      <title>Re: Is it statistically sound to include a form of the dependent variable as an independent variable</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891395#M44180</link>
      <description>&lt;P&gt;Thank you for your reply, would you mind explaining why, mathematically or with statistical assumptions, it wouldn't be feasible? I understand this is something that does not get done often, but am trying to understand specifically why it isn't.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Aug 2023 20:01:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891395#M44180</guid>
      <dc:creator>smithcl13</dc:creator>
      <dc:date>2023-08-28T20:01:13Z</dc:date>
    </item>
    <item>
      <title>Re: Is it statistically sound to include a form of the dependent variable as an independent variable</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891396#M44181</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/443584"&gt;@smithcl13&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;This is my first time posting to the SAS community so thank you all for your help!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a more statistical theory question. In a linear regression model, would it be statistically sound to use the average of the dependent as an independent variable?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Specifically, I am building a model to predict the number of days until project completion. I am interested in creating a variable named “average days” which is the average amount of time to complete a project by zip code. For the aggregation, a given data point’s project time would not contribute to the average for that particular observation, &lt;STRONG&gt;but would be used for other data points in the same zip code.&lt;/STRONG&gt; The new variable “average days” is not strongly correlated with other variables in the model (strongest correlation is 0.4 with one other variable that determines the likelihood that a permit will be needed for the job). I attached a sample of the code if interested.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;How would that aggregate be used for other data points?&amp;nbsp; Is this to impute some value that is occasionally missing for some records?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am afraid that SQL code doesn't really help this discussion.&lt;/P&gt;</description>
      <pubDate>Mon, 28 Aug 2023 20:04:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891396#M44181</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2023-08-28T20:04:59Z</dc:date>
    </item>
    <item>
      <title>Re: Is it statistically sound to include a form of the dependent variable as an independent variable</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891458#M44183</link>
      <description>&lt;P&gt;The whole point of fitting a model to data is to determine what independent variables are useful in predicting the dependent variable. The inclusion of some function of the dependent variable into the model as an independent variable violates these fundamental principles.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In an extreme case, if you wanted to create a model to predict temperature Fahrenheit, you could use a function of temperature Fahrenheit such as temperature Celsius as an independent variable, and you would get an R-squared = 1. Obviously, this is not a valid use of modeling. But where do you draw the line, what functions are allowed and what functions are not allowed? I would say ... draw the line at not allowing ANY functions of the dependent variable into the model as independent variables.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Aug 2023 11:20:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891458#M44183</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2023-08-29T11:20:12Z</dc:date>
    </item>
    <item>
      <title>Re: Is it statistically sound to include a form of the dependent variable as an independent variable</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891491#M44184</link>
      <description>&lt;P&gt;For your&amp;nbsp;&lt;SPAN&gt;Fahrenheit example, basically what I am asking is, what is the statistical issue with using the average of August 2022 temperature to predict August 29, 2023's temperature?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Aug 2023 14:05:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891491#M44184</guid>
      <dc:creator>smithcl13</dc:creator>
      <dc:date>2023-08-29T14:05:30Z</dc:date>
    </item>
    <item>
      <title>Re: Is it statistically sound to include a form of the dependent variable as an independent variable</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891498#M44185</link>
      <description>&lt;P&gt;Seems like you have changed the question to use last year's temperature to predict this year's temperature. I don't have a problem with that. You are not using the dependent variable (this year's temperature) or a function of this year's temperature as an independent variable in the prediction.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Aug 2023 14:18:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891498#M44185</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2023-08-29T14:18:25Z</dc:date>
    </item>
    <item>
      <title>Re: Is it statistically sound to include a form of the dependent variable as an independent variable</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891543#M44186</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/443584"&gt;@smithcl13&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;For your&amp;nbsp;&lt;SPAN&gt;Fahrenheit example, basically what I am asking is, what is the statistical issue with using the average of August 2022 temperature to predict August 29, 2023's temperature?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I was actually involved&amp;nbsp; in analyzing the behavior of two different approaches to weather simulation such as daily max/min temperatures and precipitation. One model used a monthly parameterization, which was basically a monthly average plus a couple of range parameters to provide variety. When graphing a summary of simulated mean/max/min temperatures across calendar dates there was notable stair-step to the generated values. The first of the month would show a marked "jump" in temps and was pretty similar start to end of the month. When superimposing an historical record of similar summary&amp;nbsp; the differences could be seen where the first/end of a month simulation did not track the more gradual recorded values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can read an online version of the PDF at the link below. Page 13 shows the graphs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://journals.ametsoc.org/view/journals/apme/35/10/1520-0450_1996_035_1878_swsoaa_2_0_co_2.xml" target="_blank"&gt;https://journals.ametsoc.org/view/journals/apme/35/10/1520-0450_1996_035_1878_swsoaa_2_0_co_2.xml&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;One of the conclusion we had was that using monthly parameters, if using the simulation to predict weather effects such as floods, would be that timing of likely events could be shifted considerably to start/end of calendar month. Also the rather abrupt changes would affect temperature related elements like power demands for heating/cooling or snow melt run-off volumes.&lt;/P&gt;</description>
      <pubDate>Tue, 29 Aug 2023 15:20:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891543#M44186</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2023-08-29T15:20:59Z</dc:date>
    </item>
    <item>
      <title>Re: Is it statistically sound to include a form of the dependent variable as an independent variable</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891616#M44192</link>
      <description>&lt;P&gt;For each observaion i, you propose to use as a regressor the mean of the dependent variable y, excluding the i'th value of y.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But I believe this&amp;nbsp; means y&lt;SUB&gt;i&lt;/SUB&gt;&amp;nbsp;is really being regressed on itself.&amp;nbsp; Consider the algebra below, starting with my understanding of your proposed regression model:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="6"&gt;&amp;nbsp; y&lt;SUB&gt;i&lt;/SUB&gt; = α + β&lt;SUB&gt;1&lt;/SUB&gt;(Mean of y excluding y&lt;SUB&gt;i&lt;/SUB&gt;) + β&lt;SUB&gt;2&lt;/SUB&gt;x&lt;SUB&gt;i&lt;/SUB&gt; + … + е&lt;SUB&gt;i&lt;/SUB&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;which, after a little algebra (remember &lt;FONT size="4"&gt;ny̅&lt;/FONT&gt;&amp;nbsp;is sum of all&amp;nbsp;&lt;FONT size="4"&gt;y&lt;SUB&gt;i&lt;/SUB&gt;&lt;/FONT&gt;&amp;nbsp;) :&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="6"&gt;&lt;SUB&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;= &lt;/SUB&gt;α + β&lt;SUB&gt;1 &lt;/SUB&gt;(ny̅ - y&lt;SUB&gt;i&lt;/SUB&gt;)/(n-1) + β&lt;SUB&gt;2&lt;/SUB&gt;x&lt;SUB&gt;i&lt;/SUB&gt; + … + е&lt;SUB&gt;i&lt;/SUB&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="4"&gt;&lt;SUB&gt;which becomes&lt;/SUB&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="6"&gt;&lt;SUB&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;= &lt;/SUB&gt;α + β&lt;SUB&gt;1 &lt;/SUB&gt;ny̅/(n-1) - β&lt;SUB&gt;1&lt;/SUB&gt;y&lt;SUB&gt;i&lt;/SUB&gt;/(n-1) + β&lt;SUB&gt;2&lt;/SUB&gt;x&lt;SUB&gt;i&lt;/SUB&gt; + … + е&lt;SUB&gt;i&lt;/SUB&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="4"&gt;&lt;SUB&gt;The first two terms above,&amp;nbsp;&lt;EM&gt;&lt;STRONG&gt;α&lt;/STRONG&gt;&lt;/EM&gt;&amp;nbsp; and&amp;nbsp; &amp;nbsp;&lt;EM&gt;&lt;STRONG&gt;β1ny̅/(n-1)&lt;/STRONG&gt;&lt;/EM&gt;, are just constants.&amp;nbsp; So just define a new constant term,&amp;nbsp;&lt;/SUB&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="6"&gt;&lt;SUB&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/SUB&gt;&lt;/FONT&gt;&lt;FONT size="6"&gt;α&lt;SUB&gt;2 ≡ &amp;nbsp;&lt;/SUB&gt;α + β&lt;SUB&gt;1 &lt;/SUB&gt;ny̅/(n-1)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="6"&gt;&lt;SUB&gt;which means yo&lt;/SUB&gt;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;STRONG&gt;&lt;FONT size="6"&gt;&lt;SUB&gt;ur regression is really&lt;/SUB&gt;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT size="6"&gt;&amp;nbsp; &amp;nbsp;= α&lt;SUB&gt;2&lt;/SUB&gt; - β&lt;SUB&gt;1&lt;/SUB&gt;y&lt;SUB&gt;i&lt;/SUB&gt;/(n-1) + β&lt;SUB&gt;2&lt;/SUB&gt;x&lt;SUB&gt;i&lt;/SUB&gt; + … + е&lt;SUB&gt;i&lt;/SUB&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This looks like a regression of&lt;FONT size="4"&gt;&amp;nbsp;y&lt;SUB&gt;i&lt;/SUB&gt;&amp;nbsp;on itself.&amp;nbsp; In fact, I guess it would yield an r-squared of one, which is what happens when I simulated below my understanding of your proposal:&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have ;
  set sashelp.class (in=firstpass)   sashelp.class (in=second_pass);
  if firstpass then total_wgt+weight;

  if second_pass;
  mean_excluding_current_wgt=(total_wgt-weight)/18;
run;


proc reg data=have;
  model weight=age mean_excluding_current_wgt;
  run;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 29 Aug 2023 23:26:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891616#M44192</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2023-08-29T23:26:34Z</dc:date>
    </item>
    <item>
      <title>Re: Is it statistically sound to include a form of the dependent variable as an independent variable</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891623#M44194</link>
      <description>&lt;P&gt;Check if you have SAS/ETS licensed.&amp;nbsp; It has procedures for doing TIME SERIES analysis, which is what your are proposing.&lt;/P&gt;</description>
      <pubDate>Wed, 30 Aug 2023 01:38:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Is-it-statistically-sound-to-include-a-form-of-the-dependent/m-p/891623#M44194</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2023-08-30T01:38:28Z</dc:date>
    </item>
  </channel>
</rss>

