11-13-2015 11:09 AM
I found an excellent example of how to use SAS PROC CLUSTER do to a cluster analysis. My problem is that my data is set up in a very different way. I am hoping someone can give me advice on how to rearrange my data, and then procedue with SAS PROC CLUSTER. I provide details below.
The example that I am following has the object that you want to organize (fish species in this case) listed in the first column. For example,
Variable X Variable X2 Variable X3 Variable X4
Fish species ### ### #### ####
I am trying to do the same thing. I want to look for clusters of fish species. However, my data is set up in a different way. I want to see what fish species are caught with other fish species on fishing trips. So each row is a trip. An example is below.
Fish species 1 Fish species 2 Fish species 3 fish species 4 Depth Gear
Trip A 10 3 6 9 22 trap
Trip B 7 8 11 7 31 trap
Trip C 6 0 0 4 55 hook
My objective is to do a cluster analysis that will group the fish species by looking at the data from fishing trips. For example, maybe fish species 4 should be grouped with fish species 1 because they are frequently caught together and also at similar depths and gear.
I am stuck so any help would be greatly appreciated.
11-13-2015 11:19 AM
11-13-2015 12:12 PM
This doesn't strike me as a cluster analysis either.
It strikes me more as equivalent to a "market basket" analysis, where companies analyze which products are purchased together.
But I have never done a market basket analysis. So that's all I can say.
11-14-2015 03:33 PM
To cluster the species and illustrate their association with gear and depth, I would start with correspondence analysis. Try this:
data fishNumbers; length tripID $16; input Trip $ Fish_species_1 Fish_species_2 Fish_species_3 Fish_species_4 Depth Gear $; tripID = catx(" ", Trip, Gear, depth); drop trip gear; datalines; A 10 3 6 9 22 trap B 7 8 11 7 31 trap C 6 0 0 4 55 hook ... ; proc corresp data=fishNumbers; var Fish_species_:; id tripID; run;
look at the resulting graph. Distance between species and between species and trips indicate co-occurrence (correspondence).
11-16-2015 09:25 AM
Very helpful PG. Thanks.
In my first post I had an idea of how I wanted to organize the data. However, this weekend I failed at getting my data organized in the way that I like. I am hoping somebody can help me reorganize my data. My attempts at proc summary have been unsuccessful.
I have multiple rows for the same trip. Each row is for a specific fish species. Here is an exaple of what I have:
Trip Gear Species depth pounds
A trap fish1 55 12
A trap fish2 55 4
A trap fish3 55 3
B trap fish2 40 18
B trap fish4 40 16
C hook fish3 59 21
C hook fish4 59 5
I want SAS to summarize my landings in pounds for each species for each trip. So each row is one trip. I want SAS to modify my data into the format show below:
Trip Gear Depth fish1 fish2 fish3 fish4
A trap 55 12 4 3 0
B trap 40 0 18 0 16
C hook 59 0 0 21 5
Any help would be greatly apprcieated.
11-16-2015 11:47 AM
What you need for corrrespondence analysis are fish numbers, not pounds. Assuming you got numbers :
data have; length trip gear species $16; input Trip Gear Species depth number; datalines; A trap fish1 55 12 A trap fish2 55 4 A trap fish3 55 3 B trap fish2 40 18 B trap fish4 40 16 C hook fish3 59 21 C hook fish4 59 5 ; proc sort data=have; by trip gear depth species; run; data fish; set have; tripID = catx(" ", Trip, Gear, depth); run; proc corresp data=fish; tables tripID, species; weight number; run;