This is connection to my previous post found in this link:
https://communities.sas.com/t5/Base-SAS-Programming/Clustering-Links/m-p/264960#M52032
Given that I have already performed the household linkages previously, and executed the household tagging again and there were changes, how can I properly update the household tagging.
I have provided some sample data as well as the data i'm working on(tmp) to update the clusters. I didn't include my current code to update as it doesn't provide the desired results but I created the desired output for your reference.
In the result,
ids 1,2 retain their old household tag;
id=3 got included on the same household with 1 and 2;
id = 4 retain and id = 5 moved to a different household;
ids 8 and 9 are inserted.
Appreciate any recommendations on how to acheive this.
data old;
	input household id;
	datalines;
1 1
1 2
2 4
2 5
3 6 
3 7
;
data new;
	input household id;
	datalines;
2 1
2 2
2 3
4 5
4 6
4 7
5 8
5 9
;
proc sql;
	create table tmp as
	select
		a.household,
		coalesce(a.id,b.id) as id,
		b.household as new_household
	from
		old a full outer join new b
	on(a.id=b.id)
	;
quit;
data desired;
	input household id;
	datalines;
1 1
1 2
1 3
2 4
3 5
3 6
3 7
4 8
4 9
;
I don't understand what you are talking about.
What is your table HAVE ? and What is your table WANT ?
Hi Keshan,
The old and new datasets are my HAVE and the desired table is my WANT.
The household ids from old dataset must be retained for each id, given that he didn't move from a different household on the new dataset.
I still can't get your point.
Could you explain How you could use OLD and NEW to get DESIRE ?
Old contains let's say the first run using the hash lookup solution you created (thanks again for that one!). So that one created this household grouping
household 1 - {1,4}
household 2 - {4,5}
household 3 - {6,7}
In the future I would like to execute this household exercise again, chances are the household ids we used will not be the same as its incremental, so that new run will have something like this.
household 2 - {1,2,3}
household 4 - {5,6,7}
household 5 - {8,9} (new household not existing in the previous run)
Next thing I want to do is to match the original result and the new result. using the household ids there as still the "real ids".
household 1 - {1,2,3}
household 2 - {4}
household 3 - {5,6,7} (id 5 will have its household replaced from 2 to 3)
household 4 - {8,9} (inserted to the original set, then getting the max + 1 as household id)
If you're questioning why 5 got moved to a different household, this is an example that there was no links between 4 and 5, but 5,6,7 had so it moved to a different household.
Sorry if my explanation is a bit confusing.
You didn't answer my question yet ?
I have some spare time, so I am watching for a while .
Here is a question.
In table NEW
4 5 -> 2
4 6 -> 3
4 7 -> 3
 So you got 
5 3
6 3
7 3
What if 5 6 7 map into different OLD household Like , what you are going to do ?
4 5 -> 2
4 6 -> 3
4 7 -> 4
For this instance,
id -> household
5 -> 3 (assuming it's the first record to be processed)
then the max household id of the original should be identified (which is 4) then increment to assign household to id 6 and 7
id -> household
6 -> 5
7 -> 6
I just noticed one of the confusions I guess I put in values wrongly in the desired earlier. My apologies on that.
data desired;
	input household id;
	datalines;
1 1
1 2
1 3
2 4
3 5
3 6
3 7
4 8
4 9;
I cannot see any realtionship that matches household 4 to id 8 and 9.
I think there is some conditonal assignment that you are not making the logic clear. I can hard code a result that will create your result but the genera solution is not at all obvious.
What is the purpose of your posted TMP data set?
Yes there is some logic to build the household tagging initially found in this post https://communities.sas.com/t5/Base-SAS-Programming/Clustering-Links/m-p/264960#M52032.
The TMP dataset was suppose combine the old and new result. But I think it wouldn't work so I had to consult people here in the community who might have better thoughts or approaches on this one.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
