DATA Step, Macro, Functions and more


Frequent Contributor
Posts: 140


actually my concept is to get the linkages if pan1 is again repeated in pan2 pan3 or add1 then it should get one Hid,not only that

ex  obs  pan1 pan2 pan3 add1  hid

          1.    aaa   bbb    ccc     ddd    1

           2.  qqq   rrr       www   aaa   1

           3.   rrr     ppp    mmm lll       1

           4.   uuu   zzz    ffff      ppp     1

          5     p       l        m          n      2

         6     jjjj    eee     rrr       ooo    1

         7   <all    blanks  >              3

        8    sss   www  .    .             1

        9  .         .        .        eee       1

in this example the    first obs is hid is 1

in obs2  add1 aaa is matching with pan1  aaa of obs1

in obs 3  pan1  rrr is matching in pan2 rrr of obs 2

in obs 4 add1  ppp is matching with pan2 ppp of obs 3

in obs 5 it is unique with no matching

in obs 6 pan3 rrr is matcing with pan1 rrr of obs 3

in obs 7 is having no values it got Hid as 3

obs 8 is having pan2 www matching with pan2 www of obs 2 so it got Hid 1

obs 9 add1 eee is matching with obs6 pan2 ee so it got Hid 1

Like this i want the linkages  

Frequent Contributor
Posts: 104

Re: Reg:Clustering

The approach illustrated above is sequential in nature, but I don't think it covers all potential problems, in particular, what happens when the "linkage info" shows up later in the process?  Example:

k1 k2 k3 k4 hid

aa bb cc dd  1     all new

ee ff gg hh  2     no ovlap with previous

ii jj kk ll  3     no overlap yet

aa ff cc hh  ??    record 1 & 2 that was distinct now share linkage due to this record

When the last record shows up binding record 1,2,4 (they are all connected now), do you want hid for all 3 records to be hid 1 or not?  I would assume you do want them all to be hid 1.

This issue of linkage showing up after assignment is typical in finding connected subgraphs (householding, connections,...etc).  This is not solvable in general through sorting, that's why this is always an iterative process until the assignments become stable, i.e., no more group formation is possible.

If you can describe what you desire in more detail - specifically covering all possible cases - then perhaps some sharp Smiley Wink person familiar with hash objects can point you in the right direction.

Frequent Contributor
Posts: 140


obs k1 k2 k3 k4 hid

1.  aa bb cc dd  1    

2.  ee ff gg hh  1    

3.  ii jj kk ll  2    

4.  aa ff cc hh  1

All shoudl get Hid as 1 in the first row it is 1 in the obs 4th aa if repeating in k1 and next to aa ff is there so it has a linkage to aa so it should also get 1 in obs 2

Frequent Contributor
Posts: 104


Suggest this thread is not necessary since you've got two other discussions on exactly the same thing.

Posts: 7,363


Interesting thought though!  Could one of the clustering algorithms be used to help solve the problem?  Given the number of clusters that would be needed, I doubt if any of us have a sufficiently powerfull machine to find out, but still an interesting thought.

Ask a Question
Discussion stats
  • 4 replies
  • 3 in conversation