Hello,
I would like to do a multiple regression analysis with data from 25 sample sites. My dependent variable is total fish count per site and my independent variables are: (1) BMWP score (which is a continuous numerical value) and (2) PSI score (which is also a continuous numerical value). The scores reflect the individual site conditions such as water quality and sedimentation. As i hardly ever use statistics I was wondering if anyone could tell me if it is appropriate to carry out multiple regression with this type of data. I ckecked the distribution of the data and found that the fish data is not normally distributed even after log transformation, which makes things more complicated.
Any input would be very much appreciated!
A reminder: normality is assumed for the residuals of a modelling exercise, not the raw data. You must do the regression first and then check the residuals distribution.
Now, if your fish counts are real observed counts (and not some kind of scaled estimate) they are more likely to be distributed as Poisson variates, possibly with overdispersion. SAS offers many tools to model those: GENMOD, GLIMMIX, COUNTREG.
More details about the nature and origin of your data could get you better advice.
PG
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.