BookmarkSubscribeRSS Feed
s1368485
Calcite | Level 5

Hello,

I would like to do a multiple regression analysis with data from 25 sample sites. My dependent variable is total fish count per site and my independent variables are: (1) BMWP score (which is a continuous numerical value) and (2) PSI score (which is also a continuous numerical value). The scores reflect the individual site conditions such as water quality and sedimentation. As i hardly ever use statistics I was wondering if anyone could tell me if it is appropriate to carry out multiple regression with this type of data. I ckecked the distribution of the data and found that the fish data is not normally distributed even after log transformation, which makes things more complicated.

Any input would be very much appreciated!

1 REPLY 1
PGStats
Opal | Level 21

A reminder: normality is assumed for the residuals of a modelling exercise, not the raw data. You must do the regression first and then check the residuals distribution.

Now, if your fish counts are real observed counts (and not some kind of scaled estimate) they are more likely to be distributed as Poisson variates, possibly with overdispersion. SAS offers many tools to model those: GENMOD, GLIMMIX, COUNTREG.

More details about the nature and origin of your data could get you better advice.

PG

PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1223 views
  • 0 likes
  • 2 in conversation