I have several duplicates that don't look like duplicates to SAS but I know they are. (They were determined by fuzzy matching.) The only way to know they are associated is by looking at their clustering. A is related to B, and B to E and E to A.
I need to ID each cluster, so I can run code by ClusterID. The next step will be to remove some records from each cluster based on additional requirements.
This is what I have:
data have;
input left $ right $;
cards;
A B
B E
C D
D C
E A
F G
;
run;
proc print data=have;run;
I want to create any unique ID per row, a count is fine but it doesnt need to be consecutive.
I need the data to look like this:
Want:
left right ClusterID
A B 1
B E 1
C D 2
D C 2
E A 1
F G 3
Any ideas?