DATA Step, Macro, Functions and more

Clustering Links

Accepted Solution Solved
Reply
Super Contributor
Posts: 316
Accepted Solution

Clustering Links

I would like to seek your expert advice how do i attain the result table by using the source. I would want to group the ids which are linked together.

 

As you can see here id 5 is actually both linked to id 1 and 4, but the score of the link between 1 and 5 is higher. Reason that on the desired result it 5 will be under group 1

 

Greatly appreciate your inputs.

 

data source;
input id1 id2 score;	
datalines;
1	2	8
1	3	8
3	5	10
5	4	7
4	6	8
;
		
		
data result;		
input group	id;
datalines;
1	1	
1	2	
1	3	
1	5	
2	4	
2	6	
;

Accepted Solutions
Solution
‎04-19-2016 11:19 PM
Super User
Posts: 9,662

Re: Clustering Links

That would lead to a more simple question.

 

 


data have;
infile cards ;
input from $  to $ ;
cards;
1 2
1 3
3 5
5 4
4 6
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

View solution in original post


All Replies
Super User
Posts: 9,662

Re: Clustering Links

Why would 4 and 6 go into another group ? because 7 < 8 ?

Super Contributor
Posts: 316

Re: Clustering Links

Yes, The id will go to the group where it has a higher score in it.

Trusted Advisor
Posts: 1,115

Re: Clustering Links

Hello @milts,

 

Recently, I contributed some code to a discussion of a very similar question (except that no scores were involved there). Please see the thread Recursive lookup for ID's.

Super User
Posts: 9,662

Re: Clustering Links

[ Edited ]

So 

 node: 1->3->5->4

score :    8->10->7 

7 less than 10 ,therefore drop 4 ,right ?

 

NOTE: change my code  

multidata&colon;'y'

into

m........ta : 'y' 

 

It seems this forum run into some problem.

 

UPDATED

 


data have;
infile cards expandtabs truncover ;
input from $ to $ score ;
cards;
1 2 8
1 3 8
3 5 10
5 4 7
4 6 8
;
run;




data _null_;
if _n_ eq 1 then do;
if 0 then set have(rename=(score=_score));
declare hash k(dataset:'have(rename=(score=_score)');
k.definekey('from');
k.definedata('_score');
k.definedone();

declare hash h();
h.definekey('drop');
h.definedata('drop');
h.definedone();
end;
set have end=last;
_score=.;
rc=k.find(key:to);
if _score lt score and not missing(_score) then do;drop=to;h.replace();end;
if last then h.output(dataset:'drop_obs');
run;

data full;
if _n_ eq 1 then do;
if 0 then set drop_obs;
declare hash k(dataset:'drop_obs');
k.definekey('drop');
k.definedone();

declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
set have end=last;
if k.check(key:from) ne 0 then output;
node=from; h.replace();
from=to; to=node;
if k.check(key:to) ne 0 then output;
node=from; h.replace();

if last then h.output(dataset:'node');
drop node drop score;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing
and to is not missing ))',
hashexp:20,multidata&colon;'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();


do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

 

Super Contributor
Posts: 316

Re: Clustering Links

Cool! Couldn't thank you enough as this really helps a lot.

 

But let's say removing the score, and I just want to get the linkings how do i implement it using hash objects? 

 

So given the data sample links:

 

1->2

1->3

3->5

5->4

4->6

 

What portion of the code below should I tweak so that the result of this links is that they will fall under the same household?

 

Thanks in advance!

All wil 

 

 

Solution
‎04-19-2016 11:19 PM
Super User
Posts: 9,662

Re: Clustering Links

That would lead to a more simple question.

 

 


data have;
infile cards ;
input from $  to $ ;
cards;
1 2
1 3
3 5
5 4
4 6
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata&colon;'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
Super Contributor
Posts: 316

Re: Clustering Links

Thanks a lot! This really cut down my script from running for a day to a couple of minutes!

Super Contributor
Posts: 316

Re: Clustering Links

I guess you're example here works for me. Let me try this one out.

 

https://communities.sas.com/t5/Base-SAS-Programming/Recursive-lookup-for-ID-s/td-p/259635/page/3

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 491 views
  • 2 likes
  • 3 in conversation