BookmarkSubscribeRSS Feed
KaraG
Fluorite | Level 6

I am working with a very large dataset relating fishing effort to spatial location. The sampling unit is individual fishing boats; all fishing boats were surveyed on random days with the goal of capturing 20% of the population. When a boat was surveyed, variables collected included the target fish species, how much they caught, how many anglers were aboard the boat, how many days they were out fishing before returning to shore, and the "block" they were fishing in. I want to look at summary statistics by block and species, but I have numerous instances where only one boat was recorded fishing in a given block. I am unsure as to whether I should drop any blocks that have only one observation or even any blocks with fewer than three observations. On the one hand, it seems like those blocks should be dropped due to small sample size or no replication. On the other hand, if the sampling unit is the boat and not the block, it seems like block would be just another dependent variable being collected and should not be dropped.
Thank you in advance for any suggestions.

2 REPLIES 2
mkeintz
PROC Star

@KaraG wrote:

...

On the one hand, it seems like those blocks should be dropped due to small sample size or no replication. On the other hand, if the sampling unit is the boat and not the block, it seems like block would be just another dependent variable being collected and should not be dropped.


And on the third hand, perhaps the reason a block has a small sample is because the boat operators know something unattractive about the block on the day of sampling.  If so, I presume you would want to capture that information.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
SteveDenham
Jade | Level 19

Could you consider "block" as a spatial co-ordinate based on the centroid of the area?  Then a spatial analysis with boat as the subject could be explored.

 

It would probably mean using a mixed model approach with sample weighting because this is a survey rather than a designed experiment, and a two-dimensional spatial covariance type would be needed, so you may be in some rough waters (no pun intended there).

 

You may want to look at some of the ecology literature on analysis of this type.  I know that field has a strong R preference, but similar models can be constructed in SAS.

 

SteveDenham

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 558 views
  • 2 likes
  • 3 in conversation