I have an idea, but not sure how to execute this:
1. Get a dataset with all distinct IDs.
2. Now Pick the first ID, and look for all the possible IDs in ID1 or ID2 that occurs with this ID, put them in cluster 1.
3. Now Get all other distinct IDs that occur above and repeat the step above. Keep updating the cluster.
4. Loop until we reach at a point, when we have iterated through all the IDs in cluster 1.
5. Eliminate the cluster 1 IDs from the dataset with distinct IDs and the dataset we are looking into.
6. Repeat step 2-5 until we are finished with all IDs.
Example:
Distinct IDs
A1
A2
A3
A4
A5
A6
A7
A8
A9
First ID: A1
All possible combination from dataset:
A1 A2
A1 A9
Cluster 1: A1,A2,A9
Loop Through cluster 1 except A1: For A2, possible combination:
A2 A3
Update cluster 1: A1,A2,A3,A9
Loop Through cluster 1 except A1,A2: For A9, possible combination:
A9 A1
Update cluster 1: A1,A2,A3,A9
Eliminate A1,A2,A3,A9 from distinct IDs and observations wherever they occur in dataset
Repeat above steps, keep incrementing the cluster.
... View more