<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Multiple regression for non-normal data in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117956#M6177</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;1. Q-Q plot, is a plot of distribution of the data against a known distribution. If it's not linear then the distributions are not the same. I don't see it in the output.&lt;/P&gt;&lt;P&gt;As for the difference between the two, its a personal preference to be able to see if the normality is violated. The residual vs predicted plot doesn't look that bad to me, except for the 3 outliers at the top. Is there a reason for those cases?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2. It really looks like you have 3 outliers overall that are influencing your data quite a bit, I'd try dealing with those somehow first.They show up in all the residual plots. For PBS you also have a lot of 0 or very close to zero observations, are those valid responses, what does a log transform do. All of the scales are different between the variables so you can also consider standardizing them. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;3. I don't know if there's an automatic backwards regression but you have 6 variables so its really easy to do manually IMO.&amp;nbsp; Fit the model with all variables and remove variables iteratively until you're satisfied. I won't go into all the reasons backwards regression isn't a good method &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 24 Apr 2013 14:32:49 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2013-04-24T14:32:49Z</dc:date>
    <item>
      <title>Multiple regression for non-normal data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117952#M6173</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello Sas Users (my second post today and ever!)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am overwhelmed by the number of available statistical procedures in sas, and am hoping for someone to nudge me in the right direction.&amp;nbsp; Stats are not my strength, so any attempt you could make to simplify your explanation would be appreciated. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am an ecologist.&amp;nbsp; My research question:&amp;nbsp; how do various environmental variables (such as light, soil moisture, etc) affect the density of particular tree seedlings (or saplings) &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So, basically, for each analysis, I have one response variable (seedling density) and multiple predictor variables.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I ran this analysis in proc reg, specifying a backwards regression.&amp;nbsp; However, as the response variables are based on counts they are very non-normal (heavily skewed to right, because of many zeroes).&amp;nbsp; Additionally, many of my predictor variables are heavily skewed to the right or left.&amp;nbsp; I have tried various transformations of both the predictor and response variables to satisfy the assumptions of homoscedasticity and normality of residuals--and have given up on that approach. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am reading about a lot of different procedures now, but am just not sure which would be the best for me to start learning about and working with.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;My sample size is n=60. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I also should add that scatter plots of individual predictors vs the dependent variables, do not suggest a particularly strong relationship with any one variable.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any advice would be appreciated.&amp;nbsp; Also, please do not hesitate to ask if you would like me to supply you with more information.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Meghan&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 23 Apr 2013 22:40:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117952#M6173</guid>
      <dc:creator>mrlang02</dc:creator>
      <dc:date>2013-04-23T22:40:25Z</dc:date>
    </item>
    <item>
      <title>Re: Multiple regression for non-normal data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117953#M6174</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If your response is counts you could consider poisson regression. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;One thing though, there are no assumptions about the distribution of your predictor variables, only the residuals. Can you post your best Q-Q plot to show the violation of normality for the residuals?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 24 Apr 2013 02:44:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117953#M6174</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2013-04-24T02:44:22Z</dc:date>
    </item>
    <item>
      <title>Re: Multiple regression for non-normal data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117954#M6175</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Or if you can use Normal transforming function ?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 24 Apr 2013 05:17:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117954#M6175</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2013-04-24T05:17:27Z</dc:date>
    </item>
    <item>
      <title>Re: Multiple regression for non-normal data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117955#M6176</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Reeza, thank you for your response.&amp;nbsp; I have several follow up questions, which I have tried to break up as neatly as possible.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1.&amp;nbsp; As requested, here are my diagnostic graphs from sas output, including the q-q plot.&amp;nbsp; &lt;SPAN style="text-decoration: underline;"&gt;&lt;STRONG&gt;Question&lt;/STRONG&gt;&lt;/SPAN&gt;: What does the q-q plot (assuming that is the one that has "quantile" in the label) show you that the plot of "percent" vs "residual" doesn't?&amp;nbsp; Are they both there to allow you to check for normality? (plotting close to line on q-q, and normal curve on residual-percent curve). &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;IMG alt="DiagnosticsPanel.png" class="jive-image-thumbnail jive-image" src="https://communities.sas.com/legacyfs/online/3464_DiagnosticsPanel.png" width="450" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2.&amp;nbsp; As you can see, the residuals vs predicted value plot looks bad (above).&amp;nbsp; My approach to attempt to remedy the unequal variance was to try to transform variables - first the dependent, then one or more independent variables.&amp;nbsp; In this case, nothing helped much.&amp;nbsp; &lt;SPAN style="text-decoration: underline;"&gt;Please let me know if this is not a valid approach&lt;/SPAN&gt;. &lt;/P&gt;&lt;P&gt;&lt;IMG alt="residual vs reg.png" class="jive-image-thumbnail jive-image" src="https://communities.sas.com/legacyfs/online/3465_residual vs reg.png" width="450" /&gt;&lt;/P&gt;&lt;P&gt; In another similar analysis (where I had less zeroes) I transformed the dependent variable with a reciprocal log to make it normal.&amp;nbsp; Then, I ran the regression and looked at the residual by regressor plots, for &lt;STRONG&gt;individual predictor variables (shown below).&amp;nbsp; &lt;/STRONG&gt; For predictor values where there was a cone shape (e.g. PBS, PCWD below), I tried a transformation to make the predictor value more normal, and in some cases this did improve the residual x regressor plots with random scatter.&amp;nbsp; &lt;SPAN style="text-decoration: underline;"&gt;&amp;nbsp; Was this a valid approach?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;3.&amp;nbsp; Poisson regression.&amp;nbsp; I have considered using this, however, I cannot find a way in SAS to do &lt;STRONG&gt;backwards&lt;/STRONG&gt; elimination with multiple poisson regression.&amp;nbsp; I used proc genmod to attempt this, and couldn't find a way to specify this. &lt;SPAN style="text-decoration: underline;"&gt; If you know of one, please let me know.&lt;/SPAN&gt;&amp;nbsp; If I can find out how to do that, I will probably have more questions about poisson regression, but I'll let it lay for now.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks again, and if you made it this far, you should get cookies or something!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Meghan&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 24 Apr 2013 14:11:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117955#M6176</guid>
      <dc:creator>mrlang02</dc:creator>
      <dc:date>2013-04-24T14:11:51Z</dc:date>
    </item>
    <item>
      <title>Re: Multiple regression for non-normal data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117956#M6177</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;1. Q-Q plot, is a plot of distribution of the data against a known distribution. If it's not linear then the distributions are not the same. I don't see it in the output.&lt;/P&gt;&lt;P&gt;As for the difference between the two, its a personal preference to be able to see if the normality is violated. The residual vs predicted plot doesn't look that bad to me, except for the 3 outliers at the top. Is there a reason for those cases?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2. It really looks like you have 3 outliers overall that are influencing your data quite a bit, I'd try dealing with those somehow first.They show up in all the residual plots. For PBS you also have a lot of 0 or very close to zero observations, are those valid responses, what does a log transform do. All of the scales are different between the variables so you can also consider standardizing them. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;3. I don't know if there's an automatic backwards regression but you have 6 variables so its really easy to do manually IMO.&amp;nbsp; Fit the model with all variables and remove variables iteratively until you're satisfied. I won't go into all the reasons backwards regression isn't a good method &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 24 Apr 2013 14:32:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117956#M6177</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2013-04-24T14:32:49Z</dc:date>
    </item>
    <item>
      <title>Re: Multiple regression for non-normal data</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117957#M6178</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Reeza,&amp;nbsp; Thanks for your thorough answers.&amp;nbsp; I know I'm asking a lot of questions, but I am not getting much help locally, and I will never graduate if I don't seek outside advice to get me headed in the right direction.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1. So the residual vs quantile plot is &lt;STRONG&gt;not&lt;/STRONG&gt; a q-q plot (second row of first graph, first graph)?&amp;nbsp; If not, what does this graph show me?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2. I will investigate the outliers.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;3. I'm re-pasting an unanswered&amp;nbsp; question I had above.&amp;nbsp; &lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;"In another similar analysis (where I had less zeroes in the dependent variable) I transformed the dependent variable with a reciprocal log to make it normal.&amp;nbsp; Then, I ran the regression and looked at the residual by regressor plots, for &lt;/SPAN&gt;&lt;STRONG style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;individual predictor variables (shown below).&amp;nbsp; &lt;/STRONG&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;For predictor values where there was a cone shape (e.g. PBS, PCWD below), I tried a transformation to make the predictor value more normal, and in some cases this did improve the residual x regressor plots with random scatter.&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff; text-decoration: underline;"&gt;&amp;nbsp; Was this a valid approach?"&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks again, &lt;/P&gt;&lt;P&gt;Meghan&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 24 Apr 2013 14:51:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Multiple-regression-for-non-normal-data/m-p/117957#M6178</guid>
      <dc:creator>mrlang02</dc:creator>
      <dc:date>2013-04-24T14:51:47Z</dc:date>
    </item>
  </channel>
</rss>

