Hello SAS Communities. I have a question regarding consolidating clusters.
Suppose I have the following dataset where LagLatM and LagLongM at the functions lag(LatM) and lag(LongM) and Change is sqrt((LatM - LagLatM)**2 + (LongM - LagLongM)**2):
ID | cluster | LatM | Long | LagLatM | LagLongM | Change |
7729 | cluster_5 | 8000 | 6000 | . | . | . |
7730 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7731 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7732 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7733 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7734 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7735 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7736 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7737 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7738 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7739 | cluster_6 | 8000 | 6000 | 8000 | 6000 | 0 |
7740 | cluster_6 | 8000 | 6000 | 8000 | 6000 | 0 |
7741 | cluster_6 | 8000 | 6000 | 8000 | 6000 | 0 |
7742 | cluster_6 | 8000 | 6000 | 8000 | 6000 | 0 |
7743 | cluster_6 | 8000 | 6000 | 8000 | 6000 | 0 |
7744 | cluster_6 | 8000 | 6000 | 8000 | 6000 | 0 |
7745 | cluster_6 | 8000 | 6000 | 8000 | 6000 | 0 |
7746 | cluster_6 | 8000 | 6000 | 8000 | 6000 | 0 |
7747 | cluster_6 | 8000 | 6000 | 8000 | 6000 | 0 |
7748 | cluster_6 | 8000 | 6000 | 8000 | 6000 | 0 |
7749 | cluster_7 | 8000 | 6000 | 8000 | 6000 | 0 |
7750 | cluster_7 | 8000 | 6000 | 8000 | 6000 | 0 |
7751 | cluster_7 | 8000 | 6000 | 8000 | 6000 | 0 |
7752 | cluster_7 | 8000 | 6000 | 8000 | 6000 | 0 |
7753 | cluster_7 | 8000 | 6000 | 8000 | 6000 | 0 |
7754 | cluster_7 | 8000 | 6000 | 8000 | 6000 | 0 |
7755 | cluster_7 | 8000 | 6000 | 8000 | 6000 | 0 |
7756 | cluster_7 | 8000 | 6000 | 8000 | 6000 | 0 |
7757 | cluster_0 | 7800 | 5500 | 8000 | 6000 | 538.5164807 |
7758 | cluster_0 | 7500 | 5400 | 7800 | 5500 | 316.227766 |
7759 | cluster_0 | 7000 | 5000 | 7500 | 5400 | 640.3124237 |
How could I consolidate clusters 5, 6, and 7 into one cluster base on the Euclidian difference (Change variable) for LatM and LongM? I would be looking for Change < 200 as a cutoff, which is why cluster_0 in points 7757, 7758, and 7759 aren't included.
My goal is to have entries 7729 - 7756 as a uniform cluster, Cluster_5 (or if possible a different name like Cluster_1000).
The result would look like this:
ID | cluster | LatM | Long | LagLatM | LagLongM | Change |
7729 | cluster_5 | 8000 | 6000 | . | . | . |
7730 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7731 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7732 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7733 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7734 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7735 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7736 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7737 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
Then the approach with:
data want;
set have;
lclus = lag(cluster);
If change < 200 then cluster = lclus;
drop lclus;
run;
Actually this could be in the same data step where you calculate the change value.
would work assuming that your data is sorted as you need.
Is the CHANGE value really the one you want to use for aggregating?
You could have CHANGE=0 on 2 clusters while having vary different latitudes in these clusters.
Not quite sure what you mean by consolidate. Is your goal to reduce the number of records or to have a single value in a variable to indicate membership?
As an aside instead of LatM - LagLatM, if LagLatM is the Lagged value you might be able to replace that code with Dif(LatM) or Abs(Dif(LatM)).
If you want to reduce the number records then something like:
If change = 0 then delete.
Grouping perhaps
lclus = lag(cluster);
If change < 200 then cluster = lclus;
If you could show what you think the output for your example data would look like it would help clarify things.
Hello user ballardw,
My goal is to have entries 7729 - 7756 as a uniform cluster, Cluster_5 (or if possible a different name like Cluster_1000).
The result would look like this:
ID | cluster | LatM | Long | LagLatM | LagLongM | Change |
7729 | cluster_5 | 8000 | 6000 | . | . | . |
7730 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7731 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7732 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7733 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7734 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7735 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7736 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7737 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7738 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7739 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7740 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7741 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7742 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7743 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7744 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7745 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7746 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7747 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7748 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7749 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7750 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7751 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7752 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7753 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7754 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7755 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7756 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7757 | cluster_0 | 7800 | 5500 | 8000 | 6000 | 538.5164807 |
7758 | cluster_0 | 7500 | 5400 | 7800 | 5500 | 316.227766 |
7759 | cluster_0 | 7000 | 5000 | 7500 | 5400 |
640.3124237 |
Or like this:
ID | cluster | LatM | Long | LagLatM | LagLongM | Change |
7729 | cluster_1000 | 8000 | 6000 | . | . | . |
7730 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7731 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7732 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7733 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7734 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7735 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7736 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7737 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7738 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7739 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7740 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7741 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7742 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7743 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7744 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7745 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7746 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7747 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7748 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7749 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7750 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7751 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7752 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7753 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7754 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7755 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7756 | cluster_1000 | 8000 | 6000 | 8000 | 6000 | 0 |
7757 | cluster_0 | 7800 | 5500 | 8000 | 6000 | 538.5164807 |
7758 | cluster_0 | 7500 | 5400 | 7800 | 5500 | 316.227766 |
7759 | cluster_0 | 7000 | 5000 | 7500 | 5400 | 640.3124237 |
Thank you, and I hope this clarifies.
My goal is to have entries 7729 - 7756 as a uniform cluster, Cluster_5 (or if possible a different name like Cluster_1000).
The result would look like this:
ID | cluster | LatM | Long | LagLatM | LagLongM | Change |
7729 | cluster_5 | 8000 | 6000 | . | . | . |
7730 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7731 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7732 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7733 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7734 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7735 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7736 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
7737 | cluster_5 | 8000 | 6000 | 8000 | 6000 | 0 |
Then the approach with:
data want;
set have;
lclus = lag(cluster);
If change < 200 then cluster = lclus;
drop lclus;
run;
Actually this could be in the same data step where you calculate the change value.
would work assuming that your data is sorted as you need.
Hello, user ChrisNZ. This is a sample set of data to prevent violation of personal information. I am looking for consolidation of these clusters when the variable CHANGE is less than 200.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.