BookmarkSubscribeRSS Feed
s1368485
Calcite | Level 5

Hello,

I would like to do a multiple regression analysis with data from 25 sample sites. My dependent variable is total fish count per site and my independent variables are: (1) BMWP score (which is a continuous numerical value) and (2) PSI score (which is also a continuous numerical value). The scores reflect the individual site conditions such as water quality and sedimentation. As i hardly ever use statistics I was wondering if anyone could tell me if it is appropriate to carry out multiple regression with this type of data. I ckecked the distribution of the data and found that the fish data is not normally distributed even after log transformation, which makes things more complicated.

Any input would be very much appreciated!

1 REPLY 1
PGStats
Opal | Level 21

A reminder: normality is assumed for the residuals of a modelling exercise, not the raw data. You must do the regression first and then check the residuals distribution.

Now, if your fish counts are real observed counts (and not some kind of scaled estimate) they are more likely to be distributed as Poisson variates, possibly with overdispersion. SAS offers many tools to model those: GENMOD, GLIMMIX, COUNTREG.

More details about the nature and origin of your data could get you better advice.

PG

PG

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1307 views
  • 0 likes
  • 2 in conversation