BookmarkSubscribeRSS Feed
Pthaon
Calcite | Level 5

Hello everyone,

I work with SAS 9.4 and I'm currently analysing some datas about roots.

But I have encountered a problem with the analysis of those graphs :

SGPlot163.png

 

This graph reprensent the (eucl = euclidian) distance from the origin of the root( the origin isn't included in the graph I'm focusing on the growth in later stage of the plant where it's more linear) over time (DegDay)

As we can see it seems there is 5 roots growing linearly and my goal is to regroup those points together. For each point I have additional datas: position X and Y :

 

SGPlot164.png

 

So I tried some cluster methods on X Y and time ( degday ) with :

ods graphics on;

proc cluster data=tmp method=fle ccc pseudo print=15  outtree = tree
   plots( maxpoints=300)=den(height=rsq) STANDARD ;
   BY exp ID;
   var x y  degday ;
   copy  eucl ;
run;

ods graphics off;

PROC TREE DATA = tree OUT = tree2 HEIGHT= RSQ LEVEL = 0.99 ;
RUN;

PROC SORT DATA = tree (KEEP = _NAME_ x y diameter degday eucl) OUT = trees;
  BY _NAME_;
RUN;

PROC SORT DATA = tree2;
  BY _NAME_;
RUN;

DATA tree3;
  MERGE trees tree2 (IN = b);
  BY _NAME_;
  IF b;
RUN;

PROC SGPLOT DATA = tree3;
  SCATTER X = degday Y = eucl / GROUP = cluster;
RUN;

SGPlot165.png

 

Then for each cluster (with 4 or more points) I calculated the slope and intercept and retried clustering with approcimatly the same method and got this :

SGPlot169.png

Better but still not ok. And to add to the challenge I have those type of data for 2000 roots system.

So my questions are :

Is there a better method than clustering for some linear data like that?

Or the clustering method is good but I'm not doing it the right way?

 

I know I might not be completly clear with my explanations and my english isn't the best so do not hesitate to ask for any additional information or clarification.

 

Thanks in advance;

 

Patrick

5 REPLIES 5
Rick_SAS
SAS Super FREQ
  • In the data that you show, are there 5 plants? 
  • Do the observations correspond to the growth of those five plants over time (approximately 80 days)?
  • Do the (X,Y) positions represents the coordinates of some feature relative to the root, so that euclid = sqrt(X**2 + Y**2)? 

 

Cluster analysis tries to find data that are grouped near some central location (the "center" of the cluster). Your observations are not clumped, they are strung out in a line. 

 

Are you trying to group together plants for which the growth curves (euclid vs time) are similar? If so, fit a regression line to each growth curve and then cluster the (Intercept, Slope) pairs in parameter space. That will indicate which of your 2000 plants.have similar growth curves.

Pthaon
Calcite | Level 5
  • No it's 1 plant with 5 different roots growing and I have that type of graph for each plant but I don't in know advance how many roots there will be for each plant
  • Yes it's expressed in thermal time, it represent the accumulation of the temperature of each day, the plants are more influenced by temperature than time. But we can interpret that like the growth of the roots of one plant over time.
  • Yes it's the spatial position in 2D of the tip of a root for a given thermal time (degday). But the image processing software does not always detect the tip since sometimes the roots can overlap. And Yes eucl =  sqrt(X**2 + Y**2)

 

What I'm trying to do is to put together in a group each line (that would represent a root) since in my data there is nothing differenciating them (and then after it will allow me to calculate each growth curve for each roots).

 

And I realise that the cluster method might not be the best to put together points strung in a line. Do you know any other method that might be more suited to do that?

 

Rick_SAS
SAS Super FREQ

I see. So

1. The data is collected by "image processing software," which means you do not know which root a given (X,Y, time) observation belongs to.

2. You have no idea how many roots there are.

3. Some roots might grow behind another, or even intertwine,

4. You only have (X,Y, time) data when the real problem is (X,Y,Z,time). 

 

 

To me this sounds like a problem in computer vision or image processing rather than statistics. Good luck.

Pthaon
Calcite | Level 5

Yes you got the idea. Image processing  might be better but I lack knowledge in those type of software. The easy ones are not really adapted to a series of photos over time.

 

In your opinion there is no way to regroup those lines of points together ? 

 

But anyway thanks for your quick reponse!

 

 

Rick_SAS
SAS Super FREQ

> In your opinion there is no way to regroup those lines of points together? 

I don't know. I am just one person with limited knowledge. I am not aware of a pre-packaged algorithm that groups this kind of data.

 

Given enough time and effort, you could probably construct and implement a heuristic algorithm that works for most of the data. You would probably need to use a high-level language like SAS/IML. Whether you want to pursue it depends on how much time you have and how important it is to solve this problem.

 

If you decide to pursue it, I suggest you post the sample data for this 5-root problem. Someone might get interested and work on the problem.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1311 views
  • 0 likes
  • 2 in conversation