I found an excellent example of how to use SAS PROC CLUSTER do to a cluster analysis. My problem is that my data is set up in a very different way. I am hoping someone can give me advice on how to rearrange my data, and then procedue with SAS PROC CLUSTER. I provide details below.
The example that I am following has the object that you want to organize (fish species in this case) listed in the first column. For example,
Variable X Variable X2 Variable X3 Variable X4
Fish species ### ### #### ####
I am trying to do the same thing. I want to look for clusters of fish species. However, my data is set up in a different way. I want to see what fish species are caught with other fish species on fishing trips. So each row is a trip. An example is below.
Fish species 1 Fish species 2 Fish species 3 fish species 4 Depth Gear
Trip A 10 3 6 9 22 trap
Trip B 7 8 11 7 31 trap
Trip C 6 0 0 4 55 hook
My objective is to do a cluster analysis that will group the fish species by looking at the data from fishing trips. For example, maybe fish species 4 should be grouped with fish species 1 because they are frequently caught together and also at similar depths and gear.
I am stuck so any help would be greatly appreciated.
This doesn't strike me as a cluster analysis either.
It strikes me more as equivalent to a "market basket" analysis, where companies analyze which products are purchased together.
But I have never done a market basket analysis. So that's all I can say.
To cluster the species and illustrate their association with gear and depth, I would start with correspondence analysis. Try this:
data fishNumbers;
length tripID $16;
input Trip $ Fish_species_1 Fish_species_2 Fish_species_3 Fish_species_4 Depth Gear $;
tripID = catx(" ", Trip, Gear, depth);
drop trip gear;
datalines;
A 10 3 6 9 22 trap
B 7 8 11 7 31 trap
C 6 0 0 4 55 hook
...
;
proc corresp data=fishNumbers;
var Fish_species_:;
id tripID;
run;
look at the resulting graph. Distance between species and between species and trips indicate co-occurrence (correspondence).
Very helpful PG. Thanks.
In my first post I had an idea of how I wanted to organize the data. However, this weekend I failed at getting my data organized in the way that I like. I am hoping somebody can help me reorganize my data. My attempts at proc summary have been unsuccessful.
I have multiple rows for the same trip. Each row is for a specific fish species. Here is an exaple of what I have:
Trip Gear Species depth pounds
A trap fish1 55 12
A trap fish2 55 4
A trap fish3 55 3
B trap fish2 40 18
B trap fish4 40 16
C hook fish3 59 21
C hook fish4 59 5
I want SAS to summarize my landings in pounds for each species for each trip. So each row is one trip. I want SAS to modify my data into the format show below:
Trip Gear Depth fish1 fish2 fish3 fish4
A trap 55 12 4 3 0
B trap 40 0 18 0 16
C hook 59 0 0 21 5
Any help would be greatly apprcieated.
What you need for corrrespondence analysis are fish numbers, not pounds. Assuming you got numbers :
data have;
length trip gear species $16;
input Trip Gear Species depth number;
datalines;
A trap fish1 55 12
A trap fish2 55 4
A trap fish3 55 3
B trap fish2 40 18
B trap fish4 40 16
C hook fish3 59 21
C hook fish4 59 5
;
proc sort data=have; by trip gear depth species; run;
data fish;
set have;
tripID = catx(" ", Trip, Gear, depth);
run;
proc corresp data=fish;
tables tripID, species;
weight number;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.