I found an excellent example of how to use SAS PROC CLUSTER do to a cluster analysis. My problem is that my data is set up in a very different way. I am hoping someone can give me advice on how to rearrange my data, and then procedue with SAS PROC CLUSTER. I provide details below.
The example that I am following has the object that you want to organize (fish species in this case) listed in the first column. For example,
Variable X Variable X2 Variable X3 Variable X4
Fish species ### ### #### ####
I am trying to do the same thing. I want to look for clusters of fish species. However, my data is set up in a different way. I want to see what fish species are caught with other fish species on fishing trips. So each row is a trip. An example is below.
Fish species 1 Fish species 2 Fish species 3 fish species 4 Depth Gear
Trip A 10 3 6 9 22 trap
Trip B 7 8 11 7 31 trap
Trip C 6 0 0 4 55 hook
My objective is to do a cluster analysis that will group the fish species by looking at the data from fishing trips. For example, maybe fish species 4 should be grouped with fish species 1 because they are frequently caught together and also at similar depths and gear.
I am stuck so any help would be greatly appreciated.
This doesn't strike me as a cluster analysis either.
It strikes me more as equivalent to a "market basket" analysis, where companies analyze which products are purchased together.
But I have never done a market basket analysis. So that's all I can say.
To cluster the species and illustrate their association with gear and depth, I would start with correspondence analysis. Try this:
data fishNumbers;
length tripID $16;
input Trip $ Fish_species_1 Fish_species_2 Fish_species_3 Fish_species_4 Depth Gear $;
tripID = catx(" ", Trip, Gear, depth);
drop trip gear;
datalines;
A 10 3 6 9 22 trap
B 7 8 11 7 31 trap
C 6 0 0 4 55 hook
...
;
proc corresp data=fishNumbers;
var Fish_species_:;
id tripID;
run;
look at the resulting graph. Distance between species and between species and trips indicate co-occurrence (correspondence).
Very helpful PG. Thanks.
In my first post I had an idea of how I wanted to organize the data. However, this weekend I failed at getting my data organized in the way that I like. I am hoping somebody can help me reorganize my data. My attempts at proc summary have been unsuccessful.
I have multiple rows for the same trip. Each row is for a specific fish species. Here is an exaple of what I have:
Trip Gear Species depth pounds
A trap fish1 55 12
A trap fish2 55 4
A trap fish3 55 3
B trap fish2 40 18
B trap fish4 40 16
C hook fish3 59 21
C hook fish4 59 5
I want SAS to summarize my landings in pounds for each species for each trip. So each row is one trip. I want SAS to modify my data into the format show below:
Trip Gear Depth fish1 fish2 fish3 fish4
A trap 55 12 4 3 0
B trap 40 0 18 0 16
C hook 59 0 0 21 5
Any help would be greatly apprcieated.
What you need for corrrespondence analysis are fish numbers, not pounds. Assuming you got numbers :
data have;
length trip gear species $16;
input Trip Gear Species depth number;
datalines;
A trap fish1 55 12
A trap fish2 55 4
A trap fish3 55 3
B trap fish2 40 18
B trap fish4 40 16
C hook fish3 59 21
C hook fish4 59 5
;
proc sort data=have; by trip gear depth species; run;
data fish;
set have;
tripID = catx(" ", Trip, Gear, depth);
run;
proc corresp data=fish;
tables tripID, species;
weight number;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.