BookmarkSubscribeRSS Feed
LKlein88
Calcite | Level 5

I'm trying to write a formula that would do the following:

1) Identify which points are new clusters within given conditions (those conditions are when the Euclidean Distance is sufficiently small (<600), and when the time difference is sufficiently large (>15).  The conditional statements are not the current issue, but renaming the BaseClusterID variable is.

2) I want to be able to create a new cluster ID name for each cluster for which this condition holds (i.e. ID points 3, 223, 10344, and 16078 all are satisfied by the above conditions, so I'd want them all named a different cluster ID (Cluster 1, 2, 3, and 4).

3) Every sequential ID point which satisfies this condition falls in the same cluster (so points 4 and 5 are in the same cluster as point 3).

I wanted to know if it was possible to achieve this renaming of clusters with Do loops and Arrays.  Any assistance or direction would be much appreciated.

Here is a sample of what I have and what I am looking for:

Dataset One (what I have):

ID     BaseClusterID     DeltaTime     EucDistance

1     cluster_0               3                    70

2     cluster_0               1                    4000

3     cluster_0                22                   25

4     cluster_0                2                     80

5     cluster_0                2                     200

...

Dataset Two (what I'm looking for):

ID     BaseClusterID     DeltaTime     EucDistance

1     cluster_0               3                    70

2     cluster_0               1                    4000

3     cluster_1                22                   25

4     cluster_1                2                     80

5     cluster_1                2                     200

...

6 REPLIES 6
ballardw
Super User

I think you'll need to provide a bit more information about the input data. From your dataset one I have no way/reason to tell that ID value 4 should be a different cluster than ID 1. I have to assume there are some groups of coordinates that are used as the base and another set compared with those and possibly there is a rule about which base(?) coordinates are considered when deciding which cluster value assignment is considered.

LKlein88
Calcite | Level 5

I understand these concerns and appreciate the prompt response.  So for this project, I want these data to be grouped based on proximity in time and space. 

ExampleA.png

So image we're talking about points 1 - 9.  Points 3 - 9 form a cluster.  But if there was missing data between points 3 and 4, there might be a larger gap in time.  It is still evident there is a cluster there, however, as all points are sufficiently close to one another.  I'm looking to detect that Point 3 is the first point in this cluster and to read in that all other points are sufficiently close in time and space to say they are also points in this cluster.

ballardw
Super User

Without explicit data for coordinates I think I would approach this using Proc Fastclus. Possibly looking at creating the potential geographic clusters first and then applying the time element afterwards.

LKlein88
Calcite | Level 5

I'll definitely look into Proc Fastclus, thank you much.  I should also mention that I do have explicit data for the coordinates, and these take place after running an ST-DBSCAN analysis.  I'm just wondering if it is possible to rename points 3 - 9 using Do loops/ Arrays.

ballardw
Super User

If you have a single column likc Cluster assigning values is easy. You should note that in FASTCLUS by default it will generate values like CLUSTER1, CLUSTER2 etc. to identify the groups of coordinates it recommends as a cluster. So your loop/array may not be needed.

LKlein88
Calcite | Level 5

I understand automating the function is a quick and simple way to identify the groups of coordinates.  However, this is post-cluster scan analysis for any datum that might've fallen through the cracks, an aspect that I feel is best handled through manual detection.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1143 views
  • 0 likes
  • 2 in conversation