DATA Step, Macro, Functions and more

Updating Clusters

Reply
Super Contributor
Posts: 326

Updating Clusters

[ Edited ]

This is connection to my previous post found in this link:

https://communities.sas.com/t5/Base-SAS-Programming/Clustering-Links/m-p/264960#M52032

 

Given that I have already performed the household linkages previously, and executed the household tagging again and there were changes, how can I properly update the household tagging.

 

I have provided some sample data as well as the data i'm working on(tmp) to update the clusters. I didn't include my current code to update as it doesn't provide the desired results but I created the desired output for your reference.

 

In the result,

ids 1,2 retain their old household tag;

id=3 got included on the same household with 1 and 2;

id = 4 retain and id = 5 moved to a different household;

ids 8 and 9 are inserted.

 

Appreciate any recommendations on how to acheive this.

 



data old; input household id; datalines; 1 1 1 2 2 4 2 5 3 6 3 7 ; data new; input household id; datalines; 2 1 2 2 2 3 4 5 4 6 4 7 5 8 5 9 ; proc sql; create table tmp as select a.household, coalesce(a.id,b.id) as id, b.household as new_household from old a full outer join new b on(a.id=b.id) ; quit; data desired; input household id; datalines; 1 1 1 2 1 3 2 4 3 5 3 6 3 7 4 8 4 9
;


 

 

 

 

Super User
Posts: 10,023

Re: Updating Clusters

I don't understand what you are talking about.

What is your table HAVE ? and What is your table WANT ?

Super Contributor
Posts: 326

Re: Updating Clusters

Hi Keshan,

 

The old and new datasets are my HAVE and the desired table is my WANT.

 

 

The household ids from old dataset must be retained for each id, given that he didn't move from a different household on the new dataset.

 

Super User
Posts: 10,023

Re: Updating Clusters

I still can't get your point.

Could you explain How you could use OLD and NEW to get DESIRE ?

Super Contributor
Posts: 326

Re: Updating Clusters

Old contains let's say the first run using the hash lookup solution you created (thanks again for that one!). So that one created this household grouping

 

household 1 - {1,4}

household 2 - {4,5}

household 3 - {6,7}

 

In the future I would like to execute this household exercise again, chances are the household ids we used will not be the same as its incremental, so that new run will have something like this.

 

household 2 - {1,2,3}

household 4 - {5,6,7}

household 5 - {8,9} (new household not existing in the previous run)

 

Next thing I want to do is to match the original result and the new result. using the household ids there as still the "real ids". 

 

household 1 - {1,2,3}

household 2 - {4} 

household 3 - {5,6,7} (id 5 will have its household replaced from 2 to 3)

household 4 - {8,9} (inserted to the original set, then getting the max + 1 as household id)

 

If you're questioning why 5 got moved to a different household, this is an example that there was no links between 4 and 5, but 5,6,7 had so it moved to a different household.

 

Sorry if my explanation is a bit confusing.

Super User
Posts: 10,023

Re: Updating Clusters

You didn't answer my question yet ?

Super User
Posts: 10,023

Re: Updating Clusters

[ Edited ]

I have some spare time, so I am watching for a while .

Here is a question.

 

In table NEW

 

4 5 -> 2
4 6 -> 3
4 7 -> 3

 So you got
5 3
6 3
7 3
What if 5 6 7 map into different OLD household Like , what you are going to do ?

4 5 -> 2
4 6 -> 3
4 7 -> 4

Super Contributor
Posts: 326

Re: Updating Clusters

For this instance,

 

id -> household

5 -> 3 (assuming it's the first record to be processed)

 

then the max household id of the original should be identified (which is 4) then increment to assign household to id 6 and 7

id -> household

6 -> 5

7 -> 6

 

I just noticed one of the confusions I guess I put in values wrongly in the desired earlier. My apologies on that.

 

data desired;
	input household id;
	datalines;
1 1
1 2
1 3
2 4
3 5
3 6
3 7
4 8
4 9;

 

Super User
Posts: 11,343

Re: Updating Clusters

I cannot see any realtionship that matches household 4 to id 8 and 9.

 

I think there is some conditonal assignment that you are not making the logic clear. I can hard code a result that will create your result but the genera solution is not at all obvious.

 

What is the purpose of your posted TMP data set?

 

 

Super Contributor
Posts: 326

Re: Updating Clusters

Yes there is some logic to build the household tagging initially found in this post https://communities.sas.com/t5/Base-SAS-Programming/Clustering-Links/m-p/264960#M52032.

 

The TMP dataset was suppose combine the old and new result. But I think it wouldn't work so I had to consult people here in the community who might have better thoughts or approaches on this one.

 

 

Ask a Question
Discussion stats
  • 9 replies
  • 410 views
  • 0 likes
  • 3 in conversation