@LinusH wrote
Only a real world test will tell, but I would be REALLY surprised if a loop of SQL joins would beat a hash table look-up, especially if the depth > 3 levels
Code below generates a test data set with a million observations . The depth depends on the probabilities used in the select statement. With the numbers below I consistently get 6 levels.
Using this as input data , the SQL loop runs in a few seconds. I can't give a precise benchmark as I an running on SAS On Demand.
data have;
parent=1;component=2;output;
do i=1 to 1000000;
c=rand("uniform");
select;
when (c<0.05) do;parent=component+1; component=parent+1;end; *new family;
when (c<0.8) do;parent=parent+1; component=component+1;end; *new generation, same family;
otherwise component=component+1;* new child, same parent;
end;
output;
end;
run;
Thanks @LinusH rooting for me.
The best choice is using SAS/OR if you have it, @RobPratt might give you some code.
Or check SAS/OR documentation ,I remembered there is already an example about Connection Component.
data have;
infile cards ;
input from $ to $ ;
cards;
1 2
1 3
4 5
5 2
9 4
6 7
8 7
;
run;
data full;
set have end=last;
if _n_ eq 1 then do;
declare hash h();
h.definekey('node');
h.definedata('node');
h.definedone();
end;
output;
node=from; h.replace();
from=to; to=node;
output;
node=from; h.replace();
if last then h.output(dataset:'node');
drop node;
run;
data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();
if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
from_to.definekey('from');
from_to.definedata('to');
from_to.definedone();
if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
no.definekey('node');
no.definedata('node');
no.definedone();
do while(hi_no.next()=0);
household+1; output;
count=1;
key=node;_ha.add();
last=node;ha.add();
rc=hi.first();
do while(rc=0);
from=last;rx=from_to.find();
do while(rx=0);
key=to;ry=_ha.check();
if ry ne 0 then do;
node=to;output;rr=no.remove(key:node);
key=to;_ha.add();
count+1;
last=to;ha.add();
end;
rx=from_to.find_next();
end;
rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
Hi @Ksharp . It may be just when I run it, but the hash table solution doesn't match the output described in the original post with @Minutemaid23's data.
This works . The question will be how this performs over your 1.2 million records, and if there is a more efficient way.
data have;
input parent component;
datalines;
1 2
1 3
1 4
2. 5
2. 6
6. 7
;
%macro loop;
%do %until (&count=0);
proc sql;
create table have as
select coalesce(b.parent,a.parent) as parent , coalesce(a.component,b.component) as component
from have a left join have b on b.component=a.parent;
run;
* if there are no components in parent then done;
proc sql ;
select count(*) into :count
from have a join have b on b.component=a.parent;
run;
%end;
%mend;
%loop;
Only a real world test will tell, but I would be REALLY surprised if a loop of SQL joins would beat a hash table look-up, especially if the depth > 3 levels.
@LinusH wrote
Only a real world test will tell, but I would be REALLY surprised if a loop of SQL joins would beat a hash table look-up, especially if the depth > 3 levels
Code below generates a test data set with a million observations . The depth depends on the probabilities used in the select statement. With the numbers below I consistently get 6 levels.
Using this as input data , the SQL loop runs in a few seconds. I can't give a precise benchmark as I an running on SAS On Demand.
data have;
parent=1;component=2;output;
do i=1 to 1000000;
c=rand("uniform");
select;
when (c<0.05) do;parent=component+1; component=parent+1;end; *new family;
when (c<0.8) do;parent=parent+1; component=component+1;end; *new generation, same family;
otherwise component=component+1;* new child, same parent;
end;
output;
end;
run;
Well, after a lengthy battle with my IT department i discovered that we do not have a SAS metadata server that i can connect to with JMP. Is there anyway to run macro's in SAS JMP? I haven't found a way with my research so far, and i have been looking everywhere.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.