hello,
Can SAS performing fuzzy grouping?
i.e. I would like to find, within a data (say 8000 record and 200 columns), whether there are some pairs of data that are likely to be duplicates / similiars.
e.g.
id f1 f2 f3 f4 f5
1 1 2 3 4 5
2 2 3 4 5 6
3 3 4 5 6 7
4 3 4 7 8 9
5 1 3 3 3 3
6 1 2 3 4 5
aim: (a) find out the pairs of data which is exactly the same
(b) find out the pairs of data which is different in 3 columns or less
Result
(a) pair (1 - 6 ) with fields F1 to F5
(b) pair (3 - 4 ) with fields F3, F4, F5
pair (1 - 5 ) with fields F2, F4, F5
......
Thanks for your help
... View more