@LinusH wrote
Only a real world test will tell, but I would be REALLY surprised if a loop of SQL joins would beat a hash table look-up, especially if the depth > 3 levels
Code below generates a test data set with a million observations . The depth depends on the probabilities used in the select statement. With the numbers below I consistently get 6 levels.
Using this as input data , the SQL loop runs in a few seconds. I can't give a precise benchmark as I an running on SAS On Demand.
data have;
parent=1;component=2;output;
do i=1 to 1000000;
c=rand("uniform");
select;
when (c<0.05) do;parent=component+1; component=parent+1;end; *new family;
when (c<0.8) do;parent=parent+1; component=component+1;end; *new generation, same family;
otherwise component=component+1;* new child, same parent;
end;
output;
end;
run;
Thanks @LinusH rooting for me.
The best choice is using SAS/OR if you have it, @RobPratt might give you some code.
Or check SAS/OR documentation ,I remembered there is already an example about Connection Component.
data have;
infile cards ;
input from $ to $ ;
cards;
1 2
1 3
4 5
5 2
9 4
6 7
8 7
;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
Hi @Ksharp . It may be just when I run it, but the hash table solution doesn't match the output described in the original post with @Minutemaid23's data.
This works . The question will be how this performs over your 1.2 million records, and if there is a more efficient way.
data have;
input parent component;
datalines;
1 2
1 3
1 4
2. 5
2. 6
6. 7
;
%macro loop;
%do %until (&count=0);
proc sql;
create table have as
select coalesce(b.parent,a.parent) as parent , coalesce(a.component,b.component) as component
from have a left join have b on b.component=a.parent;
run;
* if there are no components in parent then done;
proc sql ;
select count(*) into :count
from have a join have b on b.component=a.parent;
run;
%end;
%mend;
%loop;
Only a real world test will tell, but I would be REALLY surprised if a loop of SQL joins would beat a hash table look-up, especially if the depth > 3 levels.
@LinusH wrote
Only a real world test will tell, but I would be REALLY surprised if a loop of SQL joins would beat a hash table look-up, especially if the depth > 3 levels
Code below generates a test data set with a million observations . The depth depends on the probabilities used in the select statement. With the numbers below I consistently get 6 levels.
Using this as input data , the SQL loop runs in a few seconds. I can't give a precise benchmark as I an running on SAS On Demand.
data have;
parent=1;component=2;output;
do i=1 to 1000000;
c=rand("uniform");
select;
when (c<0.05) do;parent=component+1; component=parent+1;end; *new family;
when (c<0.8) do;parent=parent+1; component=component+1;end; *new generation, same family;
otherwise component=component+1;* new child, same parent;
end;
output;
end;
run;
Well, after a lengthy battle with my IT department i discovered that we do not have a SAS metadata server that i can connect to with JMP. Is there anyway to run macro's in SAS JMP? I haven't found a way with my research so far, and i have been looking everywhere.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.