SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

organizing my data to do a cluster analysis in SAS

Reply
Occasional Contributor
Posts: 8

organizing my data to do a cluster analysis in SAS

I found an excellent example of how to use SAS PROC CLUSTER do to a cluster analysis.  My problem is that my data is set up in a very different way.  I am hoping someone can give me advice on how to rearrange my data, and then procedue with SAS PROC CLUSTER.  I provide details below.  

 

The example that I am following has the object that you want to organize (fish species in this case) listed in the first column.  For example,

                           Variable X          Variable X2      Variable X3       Variable X4

Fish species           ###                     ###                   ####                 ####

 

I am trying to do the same thing.  I want to look for clusters of fish species.  However, my data is set up in a different way.  I want to see what fish species are caught with other fish species on fishing trips.  So each row is a trip.  An example is below.  

 

                    Fish species 1     Fish species 2      Fish species 3     fish species 4         Depth         Gear

Trip A            10                           3                               6                            9                    22             trap    

Trip B             7                            8                              11                           7                    31             trap

Trip C             6                            0                               0                            4                    55             hook

 

My objective is to do a cluster analysis that will group the fish species by looking at the data from fishing trips.  For example, maybe fish species 4 should be grouped with fish species 1 because they are frequently caught together and also at similar depths and gear.    

 

I am stuck so any help would be greatly appreciated.

Super User
Posts: 17,831

Re: organizing my data to do a cluster analysis in SAS

I think your data is organized appropriately...I question the cluster analysis methodology....but I'm having a hard time saying why I think that..
Trusted Advisor
Posts: 1,615

Re: organizing my data to do a cluster analysis in SAS

This doesn't strike me as a cluster analysis either.

 

It strikes me more as equivalent to a "market basket" analysis, where companies analyze which products are purchased together.

 

But I have never done a market basket analysis. So that's all I can say.

Respected Advisor
Posts: 4,649

Re: organizing my data to do a cluster analysis in SAS

To cluster the species and illustrate their association with gear and depth, I would start with correspondence analysis. Try this:

 

data fishNumbers;
length tripID $16;
input Trip $ Fish_species_1 Fish_species_2 Fish_species_3 Fish_species_4 Depth Gear $;
tripID = catx(" ", Trip, Gear, depth);
drop trip gear;
datalines;
A  10 3 6 9 22 trap 
B  7 8 11 7 31 trap
C  6 0 0 4 55 hook
...
;

proc corresp data=fishNumbers;
var Fish_species_:;
id tripID;

run;

look at the resulting graph. Distance between species and between species and trips indicate co-occurrence (correspondence).

PG
Occasional Contributor
Posts: 8

Re: organizing my data to do a cluster analysis in SAS

Very helpful PG.  Thanks.  

 

In my first post I had an idea of how I wanted to organize the data.  However, this weekend I failed at getting my data organized in the way that I like.  I am hoping somebody can help me reorganize my data.  My attempts at proc summary have been unsuccessful.  

 

I have multiple rows for the same trip.  Each row is for a specific fish species.  Here is an exaple of what I have: 

Trip      Gear      Species      depth       pounds

A          trap         fish1           55             12

A          trap         fish2           55               4

A          trap         fish3           55               3

B          trap         fish2           40              18

B          trap         fish4           40              16

C          hook       fish3           59               21

C          hook       fish4           59               5

 

I want SAS to summarize my landings in pounds for each species for each trip.  So each row is one trip.  I want SAS to modify my data into the format show below:

Trip      Gear     Depth    fish1      fish2       fish3         fish4 

A           trap       55         12          4             3               0

B           trap        40         0          18            0              16       

C          hook       59         0            0            21              5

 

Any help would be greatly apprcieated. 

 

 

 

 

 

 

 

Respected Advisor
Posts: 4,649

Re: organizing my data to do a cluster analysis in SAS

What you need for corrrespondence analysis are fish numbers, not pounds. Assuming you got numbers :

 

data have;
length trip gear species $16;
input Trip Gear Species depth number;
datalines;
A          trap         fish1           55             12
A          trap         fish2           55               4
A          trap         fish3           55               3
B          trap         fish2           40              18
B          trap         fish4           40              16
C          hook       fish3           59               21
C          hook       fish4           59               5
;

proc sort data=have; by trip gear depth species; run;

data fish;
set have;
tripID = catx(" ", Trip, Gear, depth);
run;

proc corresp data=fish;
tables tripID, species;
weight number;
run;
PG
Super User
Posts: 17,831

Re: organizing my data to do a cluster analysis in SAS

You can modify your data using a proc transpose.
Ask a Question
Discussion stats
  • 6 replies
  • 375 views
  • 0 likes
  • 4 in conversation