SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Help Requested: Grouping Data with Do Loops/ Arrays based on Spatio-Temporal Coordinates

Reply
Occasional Contributor
Posts: 10

Help Requested: Grouping Data with Do Loops/ Arrays based on Spatio-Temporal Coordinates

I'm trying to write a formula that would do the following:

1) Identify which points are new clusters within given conditions (those conditions are when the Euclidean Distance is sufficiently small (<600), and when the time difference is sufficiently large (>15).  The conditional statements are not the current issue, but renaming the BaseClusterID variable is.

2) I want to be able to create a new cluster ID name for each cluster for which this condition holds (i.e. ID points 3, 223, 10344, and 16078 all are satisfied by the above conditions, so I'd want them all named a different cluster ID (Cluster 1, 2, 3, and 4).

3) Every sequential ID point which satisfies this condition falls in the same cluster (so points 4 and 5 are in the same cluster as point 3).

I wanted to know if it was possible to achieve this renaming of clusters with Do loops and Arrays.  Any assistance or direction would be much appreciated.

Here is a sample of what I have and what I am looking for:

Dataset One (what I have):

ID     BaseClusterID     DeltaTime     EucDistance

1     cluster_0               3                    70

2     cluster_0               1                    4000

3     cluster_0                22                   25

4     cluster_0                2                     80

5     cluster_0                2                     200

...

Dataset Two (what I'm looking for):

ID     BaseClusterID     DeltaTime     EucDistance

1     cluster_0               3                    70

2     cluster_0               1                    4000

3     cluster_1                22                   25

4     cluster_1                2                     80

5     cluster_1                2                     200

...

Super User
Posts: 11,343

Re: Help Requested: Grouping Data with Do Loops/ Arrays based on Spatio-Temporal Coordinates

I think you'll need to provide a bit more information about the input data. From your dataset one I have no way/reason to tell that ID value 4 should be a different cluster than ID 1. I have to assume there are some groups of coordinates that are used as the base and another set compared with those and possibly there is a rule about which base(?) coordinates are considered when deciding which cluster value assignment is considered.

Occasional Contributor
Posts: 10

Re: Help Requested: Grouping Data with Do Loops/ Arrays based on Spatio-Temporal Coordinates

I understand these concerns and appreciate the prompt response.  So for this project, I want these data to be grouped based on proximity in time and space. 

ExampleA.png

So image we're talking about points 1 - 9.  Points 3 - 9 form a cluster.  But if there was missing data between points 3 and 4, there might be a larger gap in time.  It is still evident there is a cluster there, however, as all points are sufficiently close to one another.  I'm looking to detect that Point 3 is the first point in this cluster and to read in that all other points are sufficiently close in time and space to say they are also points in this cluster.

Super User
Posts: 11,343

Re: Help Requested: Grouping Data with Do Loops/ Arrays based on Spatio-Temporal Coordinates

Without explicit data for coordinates I think I would approach this using Proc Fastclus. Possibly looking at creating the potential geographic clusters first and then applying the time element afterwards.

Occasional Contributor
Posts: 10

Re: Help Requested: Grouping Data with Do Loops/ Arrays based on Spatio-Temporal Coordinates

I'll definitely look into Proc Fastclus, thank you much.  I should also mention that I do have explicit data for the coordinates, and these take place after running an ST-DBSCAN analysis.  I'm just wondering if it is possible to rename points 3 - 9 using Do loops/ Arrays.

Super User
Posts: 11,343

Re: Help Requested: Grouping Data with Do Loops/ Arrays based on Spatio-Temporal Coordinates

If you have a single column likc Cluster assigning values is easy. You should note that in FASTCLUS by default it will generate values like CLUSTER1, CLUSTER2 etc. to identify the groups of coordinates it recommends as a cluster. So your loop/array may not be needed.

Occasional Contributor
Posts: 10

Re: Help Requested: Grouping Data with Do Loops/ Arrays based on Spatio-Temporal Coordinates

I understand automating the function is a quick and simple way to identify the groups of coordinates.  However, this is post-cluster scan analysis for any datum that might've fallen through the cracks, an aspect that I feel is best handled through manual detection.

Ask a Question
Discussion stats
  • 6 replies
  • 374 views
  • 0 likes
  • 2 in conversation