Solved: Re: Question Concerning Consolidating Values for a Variable based on V...

LKlein88 · Posted 01-10-2017 04:01 PM

Hello SAS Communities. I have a question regarding consolidating clusters.

Suppose I have the following dataset where LagLatM and LagLongM at the functions lag(LatM) and lag(LongM) and Change is sqrt((LatM - LagLatM)**2 + (LongM - LagLongM)**2):

ID	cluster	LatM	Long	LagLatM	LagLongM	Change
7729	cluster_5	8000	6000	.	.	.
7730	cluster_5	8000	6000	8000	6000	0
7731	cluster_5	8000	6000	8000	6000	0
7732	cluster_5	8000	6000	8000	6000	0
7733	cluster_5	8000	6000	8000	6000	0
7734	cluster_5	8000	6000	8000	6000	0
7735	cluster_5	8000	6000	8000	6000	0
7736	cluster_5	8000	6000	8000	6000	0
7737	cluster_5	8000	6000	8000	6000	0
7738	cluster_5	8000	6000	8000	6000	0
7739	cluster_6	8000	6000	8000	6000	0
7740	cluster_6	8000	6000	8000	6000	0
7741	cluster_6	8000	6000	8000	6000	0
7742	cluster_6	8000	6000	8000	6000	0
7743	cluster_6	8000	6000	8000	6000	0
7744	cluster_6	8000	6000	8000	6000	0
7745	cluster_6	8000	6000	8000	6000	0
7746	cluster_6	8000	6000	8000	6000	0
7747	cluster_6	8000	6000	8000	6000	0
7748	cluster_6	8000	6000	8000	6000	0
7749	cluster_7	8000	6000	8000	6000	0
7750	cluster_7	8000	6000	8000	6000	0
7751	cluster_7	8000	6000	8000	6000	0
7752	cluster_7	8000	6000	8000	6000	0
7753	cluster_7	8000	6000	8000	6000	0
7754	cluster_7	8000	6000	8000	6000	0
7755	cluster_7	8000	6000	8000	6000	0
7756	cluster_7	8000	6000	8000	6000	0
7757	cluster_0	7800	5500	8000	6000	538.5164807
7758	cluster_0	7500	5400	7800	5500	316.227766
7759	cluster_0	7000	5000	7500	5400	640.3124237

How could I consolidate clusters 5, 6, and 7 into one cluster base on the Euclidian difference (Change variable) for LatM and LongM? I would be looking for Change < 200 as a cutoff, which is why cluster_0 in points 7757, 7758, and 7759 aren't included.

ballardw · Posted 01-10-2017 06:52 PM

My goal is to have entries 7729 - 7756 as a uniform cluster, Cluster_5 (or if possible a different name like Cluster_1000).

The result would look like this:

ID	cluster	LatM	Long	LagLatM	LagLongM	Change
7729	cluster_5	8000	6000	.	.	.
7730	cluster_5	8000	6000	8000	6000	0
7731	cluster_5	8000	6000	8000	6000	0
7732	cluster_5	8000	6000	8000	6000	0
7733	cluster_5	8000	6000	8000	6000	0
7734	cluster_5	8000	6000	8000	6000	0
7735	cluster_5	8000	6000	8000	6000	0
7736	cluster_5	8000	6000	8000	6000	0
7737	cluster_5	8000	6000	8000	6000	0

Then the approach with:

data want;

set have;

lclus = lag(cluster);

If change < 200 then cluster = lclus;

drop lclus;

run;

Actually this could be in the same data step where you calculate the change value.

would work assuming that your data is sorted as you need.

View solution in original post

ChrisNZ · Posted 01-10-2017 04:34 PM

Is the CHANGE value really the one you want to use for aggregating?

You could have CHANGE=0 on 2 clusters while having vary different latitudes in these clusters.

High-Performance SAS Coding - Third Edition

ballardw · Posted 01-10-2017 04:37 PM

Not quite sure what you mean by consolidate. Is your goal to reduce the number of records or to have a single value in a variable to indicate membership?

As an aside instead of LatM - LagLatM, if LagLatM is the Lagged value you might be able to replace that code with Dif(LatM) or Abs(Dif(LatM)).

If you want to reduce the number records then something like:

If change = 0 then delete.

Grouping perhaps

lclus = lag(cluster);

If change < 200 then cluster = lclus;

If you could show what you think the output for your example data would look like it would help clarify things.

LKlein88 · Posted 01-10-2017 04:43 PM

Hello user ballardw,

My goal is to have entries 7729 - 7756 as a uniform cluster, Cluster_5 (or if possible a different name like Cluster_1000).

The result would look like this:

ID	cluster	LatM	Long	LagLatM	LagLongM	Change
7729	cluster_5	8000	6000	.	.	.
7730	cluster_5	8000	6000	8000	6000	0
7731	cluster_5	8000	6000	8000	6000	0
7732	cluster_5	8000	6000	8000	6000	0
7733	cluster_5	8000	6000	8000	6000	0
7734	cluster_5	8000	6000	8000	6000	0
7735	cluster_5	8000	6000	8000	6000	0
7736	cluster_5	8000	6000	8000	6000	0
7737	cluster_5	8000	6000	8000	6000	0
7738	cluster_5	8000	6000	8000	6000	0
7739	cluster_5	8000	6000	8000	6000	0
7740	cluster_5	8000	6000	8000	6000	0
7741	cluster_5	8000	6000	8000	6000	0
7742	cluster_5	8000	6000	8000	6000	0
7743	cluster_5	8000	6000	8000	6000	0
7744	cluster_5	8000	6000	8000	6000	0
7745	cluster_5	8000	6000	8000	6000	0
7746	cluster_5	8000	6000	8000	6000	0
7747	cluster_5	8000	6000	8000	6000	0
7748	cluster_5	8000	6000	8000	6000	0
7749	cluster_5	8000	6000	8000	6000	0
7750	cluster_5	8000	6000	8000	6000	0
7751	cluster_5	8000	6000	8000	6000	0
7752	cluster_5	8000	6000	8000	6000	0
7753	cluster_5	8000	6000	8000	6000	0
7754	cluster_5	8000	6000	8000	6000	0
7755	cluster_5	8000	6000	8000	6000	0
7756	cluster_5	8000	6000	8000	6000	0
7757	cluster_0	7800	5500	8000	6000	538.5164807
7758	cluster_0	7500	5400	7800	5500	316.227766
7759	cluster_0	7000	5000	7500	5400	640.3124237

Or like this:

ID	cluster	LatM	Long	LagLatM	LagLongM	Change
7729	cluster_1000	8000	6000	.	.	.
7730	cluster_1000	8000	6000	8000	6000	0
7731	cluster_1000	8000	6000	8000	6000	0
7732	cluster_1000	8000	6000	8000	6000	0
7733	cluster_1000	8000	6000	8000	6000	0
7734	cluster_1000	8000	6000	8000	6000	0
7735	cluster_1000	8000	6000	8000	6000	0
7736	cluster_1000	8000	6000	8000	6000	0
7737	cluster_1000	8000	6000	8000	6000	0
7738	cluster_1000	8000	6000	8000	6000	0
7739	cluster_1000	8000	6000	8000	6000	0
7740	cluster_1000	8000	6000	8000	6000	0
7741	cluster_1000	8000	6000	8000	6000	0
7742	cluster_1000	8000	6000	8000	6000	0
7743	cluster_1000	8000	6000	8000	6000	0
7744	cluster_1000	8000	6000	8000	6000	0
7745	cluster_1000	8000	6000	8000	6000	0
7746	cluster_1000	8000	6000	8000	6000	0
7747	cluster_1000	8000	6000	8000	6000	0
7748	cluster_1000	8000	6000	8000	6000	0
7749	cluster_1000	8000	6000	8000	6000	0
7750	cluster_1000	8000	6000	8000	6000	0
7751	cluster_1000	8000	6000	8000	6000	0
7752	cluster_1000	8000	6000	8000	6000	0
7753	cluster_1000	8000	6000	8000	6000	0
7754	cluster_1000	8000	6000	8000	6000	0
7755	cluster_1000	8000	6000	8000	6000	0
7756	cluster_1000	8000	6000	8000	6000	0
7757	cluster_0	7800	5500	8000	6000	538.5164807
7758	cluster_0	7500	5400	7800	5500	316.227766
7759	cluster_0	7000	5000	7500	5400	640.3124237

Thank you, and I hope this clarifies.

ballardw · Posted 01-10-2017 06:52 PM

My goal is to have entries 7729 - 7756 as a uniform cluster, Cluster_5 (or if possible a different name like Cluster_1000).

The result would look like this:

ID	cluster	LatM	Long	LagLatM	LagLongM	Change
7729	cluster_5	8000	6000	.	.	.
7730	cluster_5	8000	6000	8000	6000	0
7731	cluster_5	8000	6000	8000	6000	0
7732	cluster_5	8000	6000	8000	6000	0
7733	cluster_5	8000	6000	8000	6000	0
7734	cluster_5	8000	6000	8000	6000	0
7735	cluster_5	8000	6000	8000	6000	0
7736	cluster_5	8000	6000	8000	6000	0
7737	cluster_5	8000	6000	8000	6000	0

Then the approach with:

data want;

set have;

lclus = lag(cluster);

If change < 200 then cluster = lclus;

drop lclus;

run;

Actually this could be in the same data step where you calculate the change value.

would work assuming that your data is sorted as you need.

LKlein88 · Posted 01-11-2017 03:38 PM

Thanks, user ballardw! This works well in context of the study.

LKlein88 · Posted 01-10-2017 04:39 PM

Hello, user ChrisNZ. This is a sample set of data to prevent violation of personal information. I am looking for consolidation of these clusters when the variable CHANGE is less than 200.

Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Re: Question Concerning Consolidating Values for a Variable based on Value of Another Variable

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away