DATA Step, Macro, Functions and more

Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Hello SAS Communities.  I have a question regarding consolidating clusters.

 

Suppose I have the following dataset where LagLatM and LagLongM at the functions lag(LatM) and lag(LongM) and Change is sqrt((LatM - LagLatM)**2 + (LongM - LagLongM)**2):

 

ID cluster LatM Long LagLatM LagLongM Change
7729 cluster_5 8000 6000 . . .
7730 cluster_5 8000 6000 8000 6000 0
7731 cluster_5 8000 6000 8000 6000 0
7732 cluster_5 8000 6000 8000 6000 0
7733 cluster_5 8000 6000 8000 6000 0
7734 cluster_5 8000 6000 8000 6000 0
7735 cluster_5 8000 6000 8000 6000 0
7736 cluster_5 8000 6000 8000 6000 0
7737 cluster_5 8000 6000 8000 6000 0
7738 cluster_5 8000 6000 8000 6000 0
7739 cluster_6 8000 6000 8000 6000 0
7740 cluster_6 8000 6000 8000 6000 0
7741 cluster_6 8000 6000 8000 6000 0
7742 cluster_6 8000 6000 8000 6000 0
7743 cluster_6 8000 6000 8000 6000 0
7744 cluster_6 8000 6000 8000 6000 0
7745 cluster_6 8000 6000 8000 6000 0
7746 cluster_6 8000 6000 8000 6000 0
7747 cluster_6 8000 6000 8000 6000 0
7748 cluster_6 8000 6000 8000 6000 0
7749 cluster_7 8000 6000 8000 6000 0
7750 cluster_7 8000 6000 8000 6000 0
7751 cluster_7 8000 6000 8000 6000 0
7752 cluster_7 8000 6000 8000 6000 0
7753 cluster_7 8000 6000 8000 6000 0
7754 cluster_7 8000 6000 8000 6000 0
7755 cluster_7 8000 6000 8000 6000 0
7756 cluster_7 8000 6000 8000 6000 0
7757 cluster_0 7800 5500 8000 6000 538.5164807
7758 cluster_0 7500 5400 7800 5500 316.227766
7759 cluster_0 7000 5000 7500 5400 640.3124237

How could I consolidate clusters 5, 6, and 7 into one cluster base on the Euclidian difference (Change variable) for LatM and LongM? I would be looking for Change < 200 as a cutoff, which is why cluster_0 in points 7757, 7758, and 7759 aren't included.


Accepted Solutions
Solution
‎01-11-2017 03:38 PM
Super User
Posts: 10,466

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

My goal is to have entries 7729 - 7756 as a uniform cluster, Cluster_5 (or if possible a different name like Cluster_1000).

The result would look like this:

ID cluster LatM Long LagLatM LagLongM Change
7729 cluster_5 8000 6000 . . .
7730 cluster_5 8000 6000 8000 6000 0
7731 cluster_5 8000 6000 8000 6000 0
7732 cluster_5 8000 6000 8000 6000 0
7733 cluster_5 8000 6000 8000 6000 0
7734 cluster_5 8000 6000 8000 6000 0
7735 cluster_5 8000 6000 8000 6000 0
7736 cluster_5 8000 6000 8000 6000 0
7737 cluster_5 8000 6000 8000 6000 0
 

 

Then the approach with:

data want;

   set have;

   lclus = lag(cluster);

   If change < 200 then cluster = lclus;

   drop lclus;

run;

 

Actually this could be in the same data step where you calculate the change value.

would work assuming that your data is sorted as you need.

View solution in original post


All Replies
PROC Star
Posts: 1,555

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Is the CHANGE value really the one you want to use for aggregating?

You could have CHANGE=0 on 2 clusters while having vary different latitudes in these clusters.

Super User
Posts: 10,466

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

[ Edited ]

Not quite sure what you mean by consolidate. Is your goal to reduce the number of records or to have a single value in a variable to indicate membership?

 

As an aside instead of LatM - LagLatM, if LagLatM is the Lagged value you might be able to  replace that code with Dif(LatM) or Abs(Dif(LatM)).

 

If you want to reduce the number records then something like:

If change = 0 then delete.

 

Grouping perhaps

lclus = lag(cluster);

If change < 200 then cluster = lclus;

 

If you could show what you think the output for your example data would look like it would help clarify things.

Occasional Contributor
Posts: 10

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Hello user ballardw,

 

My goal is to have entries 7729 - 7756 as a uniform cluster, Cluster_5 (or if possible a different name like Cluster_1000).

The result would look like this:

ID cluster LatM Long LagLatM LagLongM Change
7729 cluster_5 8000 6000 . . .
7730 cluster_5 8000 6000 8000 6000 0
7731 cluster_5 8000 6000 8000 6000 0
7732 cluster_5 8000 6000 8000 6000 0
7733 cluster_5 8000 6000 8000 6000 0
7734 cluster_5 8000 6000 8000 6000 0
7735 cluster_5 8000 6000 8000 6000 0
7736 cluster_5 8000 6000 8000 6000 0
7737 cluster_5 8000 6000 8000 6000 0
7738 cluster_5 8000 6000 8000 6000 0
7739 cluster_5 8000 6000 8000 6000 0
7740 cluster_5 8000 6000 8000 6000 0
7741 cluster_5 8000 6000 8000 6000 0
7742 cluster_5 8000 6000 8000 6000 0
7743 cluster_5 8000 6000 8000 6000 0
7744 cluster_5 8000 6000 8000 6000 0
7745 cluster_5 8000 6000 8000 6000 0
7746 cluster_5 8000 6000 8000 6000 0
7747 cluster_5 8000 6000 8000 6000 0
7748 cluster_5 8000 6000 8000 6000 0
7749 cluster_5 8000 6000 8000 6000 0
7750 cluster_5 8000 6000 8000 6000 0
7751 cluster_5 8000 6000 8000 6000 0
7752 cluster_5 8000 6000 8000 6000 0
7753 cluster_5 8000 6000 8000 6000 0
7754 cluster_5 8000 6000 8000 6000 0
7755 cluster_5 8000 6000 8000 6000 0
7756 cluster_5 8000 6000 8000 6000 0
7757 cluster_0 7800 5500 8000 6000 538.5164807
7758 cluster_0 7500 5400 7800 5500 316.227766
7759 cluster_0 7000 5000 7500 5400

640.3124237

 

Or like this:

 

ID cluster LatM Long LagLatM LagLongM Change
7729 cluster_1000 8000 6000 . . .
7730 cluster_1000 8000 6000 8000 6000 0
7731 cluster_1000 8000 6000 8000 6000 0
7732 cluster_1000 8000 6000 8000 6000 0
7733 cluster_1000 8000 6000 8000 6000 0
7734 cluster_1000 8000 6000 8000 6000 0
7735 cluster_1000 8000 6000 8000 6000 0
7736 cluster_1000 8000 6000 8000 6000 0
7737 cluster_1000 8000 6000 8000 6000 0
7738 cluster_1000 8000 6000 8000 6000 0
7739 cluster_1000 8000 6000 8000 6000 0
7740 cluster_1000 8000 6000 8000 6000 0
7741 cluster_1000 8000 6000 8000 6000 0
7742 cluster_1000 8000 6000 8000 6000 0
7743 cluster_1000 8000 6000 8000 6000 0
7744 cluster_1000 8000 6000 8000 6000 0
7745 cluster_1000 8000 6000 8000 6000 0
7746 cluster_1000 8000 6000 8000 6000 0
7747 cluster_1000 8000 6000 8000 6000 0
7748 cluster_1000 8000 6000 8000 6000 0
7749 cluster_1000 8000 6000 8000 6000 0
7750 cluster_1000 8000 6000 8000 6000 0
7751 cluster_1000 8000 6000 8000 6000 0
7752 cluster_1000 8000 6000 8000 6000 0
7753 cluster_1000 8000 6000 8000 6000 0
7754 cluster_1000 8000 6000 8000 6000 0
7755 cluster_1000 8000 6000 8000 6000 0
7756 cluster_1000 8000 6000 8000 6000 0
7757 cluster_0 7800 5500 8000 6000 538.5164807
7758 cluster_0 7500 5400 7800 5500 316.227766
7759 cluster_0 7000 5000 7500 5400 640.3124237

 

Thank you, and I hope this clarifies.

Solution
‎01-11-2017 03:38 PM
Super User
Posts: 10,466

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

My goal is to have entries 7729 - 7756 as a uniform cluster, Cluster_5 (or if possible a different name like Cluster_1000).

The result would look like this:

ID cluster LatM Long LagLatM LagLongM Change
7729 cluster_5 8000 6000 . . .
7730 cluster_5 8000 6000 8000 6000 0
7731 cluster_5 8000 6000 8000 6000 0
7732 cluster_5 8000 6000 8000 6000 0
7733 cluster_5 8000 6000 8000 6000 0
7734 cluster_5 8000 6000 8000 6000 0
7735 cluster_5 8000 6000 8000 6000 0
7736 cluster_5 8000 6000 8000 6000 0
7737 cluster_5 8000 6000 8000 6000 0
 

 

Then the approach with:

data want;

   set have;

   lclus = lag(cluster);

   If change < 200 then cluster = lclus;

   drop lclus;

run;

 

Actually this could be in the same data step where you calculate the change value.

would work assuming that your data is sorted as you need.

Occasional Contributor
Posts: 10

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Thanks, user ballardw! This works well in context of the study.
Occasional Contributor
Posts: 10

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Hello, user ChrisNZ.  This is a sample set of data to prevent violation of personal information.  I am looking for consolidation of these clusters when the variable CHANGE is less than 200.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 131 views
  • 0 likes
  • 3 in conversation