07-17-2014 02:07 PM
I would like to do a multiple regression analysis with data from 25 sample sites. My dependent variable is total fish count per site and my independent variables are: (1) BMWP score (which is a continuous numerical value) and (2) PSI score (which is also a continuous numerical value). The scores reflect the individual site conditions such as water quality and sedimentation. As i hardly ever use statistics I was wondering if anyone could tell me if it is appropriate to carry out multiple regression with this type of data. I ckecked the distribution of the data and found that the fish data is not normally distributed even after log transformation, which makes things more complicated.
Any input would be very much appreciated!
07-17-2014 02:40 PM
A reminder: normality is assumed for the residuals of a modelling exercise, not the raw data. You must do the regression first and then check the residuals distribution.
Now, if your fish counts are real observed counts (and not some kind of scaled estimate) they are more likely to be distributed as Poisson variates, possibly with overdispersion. SAS offers many tools to model those: GENMOD, GLIMMIX, COUNTREG.
More details about the nature and origin of your data could get you better advice.