actually my concept is to get the linkages if pan1 is again repeated in pan2 pan3 or add1 then it should get one Hid,not only that
ex obs pan1 pan2 pan3 add1 hid
1. aaa bbb ccc ddd 1
2. qqq rrr www aaa 1
3. rrr ppp mmm lll 1
4. uuu zzz ffff ppp 1
5 p l m n 2
6 jjjj eee rrr ooo 1
7 <all blanks > 3
8 sss www . . 1
9 . . . eee 1
in this example the first obs is hid is 1
in obs2 add1 aaa is matching with pan1 aaa of obs1
in obs 3 pan1 rrr is matching in pan2 rrr of obs 2
in obs 4 add1 ppp is matching with pan2 ppp of obs 3
in obs 5 it is unique with no matching
in obs 6 pan3 rrr is matcing with pan1 rrr of obs 3
in obs 7 is having no values it got Hid as 3
obs 8 is having pan2 www matching with pan2 www of obs 2 so it got Hid 1
obs 9 add1 eee is matching with obs6 pan2 ee so it got Hid 1
Like this i want the linkages
The approach illustrated above is sequential in nature, but I don't think it covers all potential problems, in particular, what happens when the "linkage info" shows up later in the process? Example:
k1 k2 k3 k4 hid
aa bb cc dd 1 all new
ee ff gg hh 2 no ovlap with previous
ii jj kk ll 3 no overlap yet
aa ff cc hh ?? record 1 & 2 that was distinct now share linkage due to this record
When the last record shows up binding record 1,2,4 (they are all connected now), do you want hid for all 3 records to be hid 1 or not? I would assume you do want them all to be hid 1.
This issue of linkage showing up after assignment is typical in finding connected subgraphs (householding, connections,...etc). This is not solvable in general through sorting, that's why this is always an iterative process until the assignments become stable, i.e., no more group formation is possible.
If you can describe what you desire in more detail - specifically covering all possible cases - then perhaps some sharp person familiar with hash objects can point you in the right direction.
obs k1 k2 k3 k4 hid
1. aa bb cc dd 1
2. ee ff gg hh 1
3. ii jj kk ll 2
4. aa ff cc hh 1
All shoudl get Hid as 1 in the first row it is 1 in the obs 4th aa if repeating in k1 and next to aa ff is there so it has a linkage to aa so it should also get 1 in obs 2
Suggest this thread is not necessary since you've got two other discussions on exactly the same thing.
Interesting thought though! Could one of the clustering algorithms be used to help solve the problem? Given the number of clusters that would be needed, I doubt if any of us have a sufficiently powerfull machine to find out, but still an interesting thought.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.