BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LKlein88
Calcite | Level 5

Hello SAS Communities.  I have a question regarding consolidating clusters.

 

Suppose I have the following dataset where LagLatM and LagLongM at the functions lag(LatM) and lag(LongM) and Change is sqrt((LatM - LagLatM)**2 + (LongM - LagLongM)**2):

 

ID cluster LatM Long LagLatM LagLongM Change
7729 cluster_5 8000 6000 . . .
7730 cluster_5 8000 6000 8000 6000 0
7731 cluster_5 8000 6000 8000 6000 0
7732 cluster_5 8000 6000 8000 6000 0
7733 cluster_5 8000 6000 8000 6000 0
7734 cluster_5 8000 6000 8000 6000 0
7735 cluster_5 8000 6000 8000 6000 0
7736 cluster_5 8000 6000 8000 6000 0
7737 cluster_5 8000 6000 8000 6000 0
7738 cluster_5 8000 6000 8000 6000 0
7739 cluster_6 8000 6000 8000 6000 0
7740 cluster_6 8000 6000 8000 6000 0
7741 cluster_6 8000 6000 8000 6000 0
7742 cluster_6 8000 6000 8000 6000 0
7743 cluster_6 8000 6000 8000 6000 0
7744 cluster_6 8000 6000 8000 6000 0
7745 cluster_6 8000 6000 8000 6000 0
7746 cluster_6 8000 6000 8000 6000 0
7747 cluster_6 8000 6000 8000 6000 0
7748 cluster_6 8000 6000 8000 6000 0
7749 cluster_7 8000 6000 8000 6000 0
7750 cluster_7 8000 6000 8000 6000 0
7751 cluster_7 8000 6000 8000 6000 0
7752 cluster_7 8000 6000 8000 6000 0
7753 cluster_7 8000 6000 8000 6000 0
7754 cluster_7 8000 6000 8000 6000 0
7755 cluster_7 8000 6000 8000 6000 0
7756 cluster_7 8000 6000 8000 6000 0
7757 cluster_0 7800 5500 8000 6000 538.5164807
7758 cluster_0 7500 5400 7800 5500 316.227766
7759 cluster_0 7000 5000 7500 5400 640.3124237

How could I consolidate clusters 5, 6, and 7 into one cluster base on the Euclidian difference (Change variable) for LatM and LongM? I would be looking for Change < 200 as a cutoff, which is why cluster_0 in points 7757, 7758, and 7759 aren't included.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

My goal is to have entries 7729 - 7756 as a uniform cluster, Cluster_5 (or if possible a different name like Cluster_1000).

The result would look like this:

ID cluster LatM Long LagLatM LagLongM Change
7729 cluster_5 8000 6000 . . .
7730 cluster_5 8000 6000 8000 6000 0
7731 cluster_5 8000 6000 8000 6000 0
7732 cluster_5 8000 6000 8000 6000 0
7733 cluster_5 8000 6000 8000 6000 0
7734 cluster_5 8000 6000 8000 6000 0
7735 cluster_5 8000 6000 8000 6000 0
7736 cluster_5 8000 6000 8000 6000 0
7737 cluster_5 8000 6000 8000 6000 0
 

 

Then the approach with:

data want;

   set have;

   lclus = lag(cluster);

   If change < 200 then cluster = lclus;

   drop lclus;

run;

 

Actually this could be in the same data step where you calculate the change value.

would work assuming that your data is sorted as you need.

View solution in original post

6 REPLIES 6
ChrisNZ
Tourmaline | Level 20

Is the CHANGE value really the one you want to use for aggregating?

You could have CHANGE=0 on 2 clusters while having vary different latitudes in these clusters.

ballardw
Super User

Not quite sure what you mean by consolidate. Is your goal to reduce the number of records or to have a single value in a variable to indicate membership?

 

As an aside instead of LatM - LagLatM, if LagLatM is the Lagged value you might be able to  replace that code with Dif(LatM) or Abs(Dif(LatM)).

 

If you want to reduce the number records then something like:

If change = 0 then delete.

 

Grouping perhaps

lclus = lag(cluster);

If change < 200 then cluster = lclus;

 

If you could show what you think the output for your example data would look like it would help clarify things.

LKlein88
Calcite | Level 5

Hello user ballardw,

 

My goal is to have entries 7729 - 7756 as a uniform cluster, Cluster_5 (or if possible a different name like Cluster_1000).

The result would look like this:

ID cluster LatM Long LagLatM LagLongM Change
7729 cluster_5 8000 6000 . . .
7730 cluster_5 8000 6000 8000 6000 0
7731 cluster_5 8000 6000 8000 6000 0
7732 cluster_5 8000 6000 8000 6000 0
7733 cluster_5 8000 6000 8000 6000 0
7734 cluster_5 8000 6000 8000 6000 0
7735 cluster_5 8000 6000 8000 6000 0
7736 cluster_5 8000 6000 8000 6000 0
7737 cluster_5 8000 6000 8000 6000 0
7738 cluster_5 8000 6000 8000 6000 0
7739 cluster_5 8000 6000 8000 6000 0
7740 cluster_5 8000 6000 8000 6000 0
7741 cluster_5 8000 6000 8000 6000 0
7742 cluster_5 8000 6000 8000 6000 0
7743 cluster_5 8000 6000 8000 6000 0
7744 cluster_5 8000 6000 8000 6000 0
7745 cluster_5 8000 6000 8000 6000 0
7746 cluster_5 8000 6000 8000 6000 0
7747 cluster_5 8000 6000 8000 6000 0
7748 cluster_5 8000 6000 8000 6000 0
7749 cluster_5 8000 6000 8000 6000 0
7750 cluster_5 8000 6000 8000 6000 0
7751 cluster_5 8000 6000 8000 6000 0
7752 cluster_5 8000 6000 8000 6000 0
7753 cluster_5 8000 6000 8000 6000 0
7754 cluster_5 8000 6000 8000 6000 0
7755 cluster_5 8000 6000 8000 6000 0
7756 cluster_5 8000 6000 8000 6000 0
7757 cluster_0 7800 5500 8000 6000 538.5164807
7758 cluster_0 7500 5400 7800 5500 316.227766
7759 cluster_0 7000 5000 7500 5400

640.3124237

 

Or like this:

 

ID cluster LatM Long LagLatM LagLongM Change
7729 cluster_1000 8000 6000 . . .
7730 cluster_1000 8000 6000 8000 6000 0
7731 cluster_1000 8000 6000 8000 6000 0
7732 cluster_1000 8000 6000 8000 6000 0
7733 cluster_1000 8000 6000 8000 6000 0
7734 cluster_1000 8000 6000 8000 6000 0
7735 cluster_1000 8000 6000 8000 6000 0
7736 cluster_1000 8000 6000 8000 6000 0
7737 cluster_1000 8000 6000 8000 6000 0
7738 cluster_1000 8000 6000 8000 6000 0
7739 cluster_1000 8000 6000 8000 6000 0
7740 cluster_1000 8000 6000 8000 6000 0
7741 cluster_1000 8000 6000 8000 6000 0
7742 cluster_1000 8000 6000 8000 6000 0
7743 cluster_1000 8000 6000 8000 6000 0
7744 cluster_1000 8000 6000 8000 6000 0
7745 cluster_1000 8000 6000 8000 6000 0
7746 cluster_1000 8000 6000 8000 6000 0
7747 cluster_1000 8000 6000 8000 6000 0
7748 cluster_1000 8000 6000 8000 6000 0
7749 cluster_1000 8000 6000 8000 6000 0
7750 cluster_1000 8000 6000 8000 6000 0
7751 cluster_1000 8000 6000 8000 6000 0
7752 cluster_1000 8000 6000 8000 6000 0
7753 cluster_1000 8000 6000 8000 6000 0
7754 cluster_1000 8000 6000 8000 6000 0
7755 cluster_1000 8000 6000 8000 6000 0
7756 cluster_1000 8000 6000 8000 6000 0
7757 cluster_0 7800 5500 8000 6000 538.5164807
7758 cluster_0 7500 5400 7800 5500 316.227766
7759 cluster_0 7000 5000 7500 5400 640.3124237

 

Thank you, and I hope this clarifies.

ballardw
Super User

My goal is to have entries 7729 - 7756 as a uniform cluster, Cluster_5 (or if possible a different name like Cluster_1000).

The result would look like this:

ID cluster LatM Long LagLatM LagLongM Change
7729 cluster_5 8000 6000 . . .
7730 cluster_5 8000 6000 8000 6000 0
7731 cluster_5 8000 6000 8000 6000 0
7732 cluster_5 8000 6000 8000 6000 0
7733 cluster_5 8000 6000 8000 6000 0
7734 cluster_5 8000 6000 8000 6000 0
7735 cluster_5 8000 6000 8000 6000 0
7736 cluster_5 8000 6000 8000 6000 0
7737 cluster_5 8000 6000 8000 6000 0
 

 

Then the approach with:

data want;

   set have;

   lclus = lag(cluster);

   If change < 200 then cluster = lclus;

   drop lclus;

run;

 

Actually this could be in the same data step where you calculate the change value.

would work assuming that your data is sorted as you need.

LKlein88
Calcite | Level 5
Thanks, user ballardw! This works well in context of the study.
LKlein88
Calcite | Level 5

Hello, user ChrisNZ.  This is a sample set of data to prevent violation of personal information.  I am looking for consolidation of these clusters when the variable CHANGE is less than 200.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 740 views
  • 0 likes
  • 3 in conversation