BookmarkSubscribeRSS Feed
elopomorph
Calcite | Level 5

I found an excellent example of how to use SAS PROC CLUSTER do to a cluster analysis.  My problem is that my data is set up in a very different way.  I am hoping someone can give me advice on how to rearrange my data, and then procedue with SAS PROC CLUSTER.  I provide details below.  

 

The example that I am following has the object that you want to organize (fish species in this case) listed in the first column.  For example,

                           Variable X          Variable X2      Variable X3       Variable X4

Fish species           ###                     ###                   ####                 ####

 

I am trying to do the same thing.  I want to look for clusters of fish species.  However, my data is set up in a different way.  I want to see what fish species are caught with other fish species on fishing trips.  So each row is a trip.  An example is below.  

 

                    Fish species 1     Fish species 2      Fish species 3     fish species 4         Depth         Gear

Trip A            10                           3                               6                            9                    22             trap    

Trip B             7                            8                              11                           7                    31             trap

Trip C             6                            0                               0                            4                    55             hook

 

My objective is to do a cluster analysis that will group the fish species by looking at the data from fishing trips.  For example, maybe fish species 4 should be grouped with fish species 1 because they are frequently caught together and also at similar depths and gear.    

 

I am stuck so any help would be greatly appreciated.

6 REPLIES 6
Reeza
Super User
I think your data is organized appropriately...I question the cluster analysis methodology....but I'm having a hard time saying why I think that..
PaigeMiller
Diamond | Level 26

This doesn't strike me as a cluster analysis either.

 

It strikes me more as equivalent to a "market basket" analysis, where companies analyze which products are purchased together.

 

But I have never done a market basket analysis. So that's all I can say.

--
Paige Miller
PGStats
Opal | Level 21

To cluster the species and illustrate their association with gear and depth, I would start with correspondence analysis. Try this:

 

data fishNumbers;
length tripID $16;
input Trip $ Fish_species_1 Fish_species_2 Fish_species_3 Fish_species_4 Depth Gear $;
tripID = catx(" ", Trip, Gear, depth);
drop trip gear;
datalines;
A  10 3 6 9 22 trap 
B  7 8 11 7 31 trap
C  6 0 0 4 55 hook
...
;

proc corresp data=fishNumbers;
var Fish_species_:;
id tripID;

run;

look at the resulting graph. Distance between species and between species and trips indicate co-occurrence (correspondence).

PG
elopomorph
Calcite | Level 5

Very helpful PG.  Thanks.  

 

In my first post I had an idea of how I wanted to organize the data.  However, this weekend I failed at getting my data organized in the way that I like.  I am hoping somebody can help me reorganize my data.  My attempts at proc summary have been unsuccessful.  

 

I have multiple rows for the same trip.  Each row is for a specific fish species.  Here is an exaple of what I have: 

Trip      Gear      Species      depth       pounds

A          trap         fish1           55             12

A          trap         fish2           55               4

A          trap         fish3           55               3

B          trap         fish2           40              18

B          trap         fish4           40              16

C          hook       fish3           59               21

C          hook       fish4           59               5

 

I want SAS to summarize my landings in pounds for each species for each trip.  So each row is one trip.  I want SAS to modify my data into the format show below:

Trip      Gear     Depth    fish1      fish2       fish3         fish4 

A           trap       55         12          4             3               0

B           trap        40         0          18            0              16       

C          hook       59         0            0            21              5

 

Any help would be greatly apprcieated. 

 

 

 

 

 

 

 

PGStats
Opal | Level 21

What you need for corrrespondence analysis are fish numbers, not pounds. Assuming you got numbers :

 

data have;
length trip gear species $16;
input Trip Gear Species depth number;
datalines;
A          trap         fish1           55             12
A          trap         fish2           55               4
A          trap         fish3           55               3
B          trap         fish2           40              18
B          trap         fish4           40              16
C          hook       fish3           59               21
C          hook       fish4           59               5
;

proc sort data=have; by trip gear depth species; run;

data fish;
set have;
tripID = catx(" ", Trip, Gear, depth);
run;

proc corresp data=fish;
tables tripID, species;
weight number;
run;
PG
Reeza
Super User
You can modify your data using a proc transpose.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1043 views
  • 0 likes
  • 4 in conversation