BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Minutemaid23
Calcite | Level 5
Hello all,

I am really struggling with this one. I’m not sure if a macro would be the best option. Here is my issue.

I have 1 table with two columns. Parent and component. I am trying to write a program that does the following.

Beginning table
Parent component
1 2
1 3
1 4
2. 5
2. 6
6. 7

Into this
Parent component
1. 2
1. 3
1. 4
1. 5
1. 6
1. 7

Basically if a component is in the parent column (like 2) I want to pull any components of 2.

The part that I am really struggling with is getting this to loop. I need sas to look for 6 in the parent column since it was a component of 2 and continue until there are no matches.

This table has about 1.2 million records. Some parents have 30 components and some have much less.

Any help is appreciated. I hope I explained this well enough.
1 ACCEPTED SOLUTION

Accepted Solutions
JohnHoughton
Quartz | Level 8

@LinusH wrote

Only a real world test will tell, but I would be REALLY surprised if a loop of SQL joins would beat a hash table look-up, especially if the depth > 3 levels

Code below generates a test data set with a million observations . The depth depends on the probabilities used in the select statement. With the numbers below I consistently get 6 levels.

 

Using this as input data , the SQL loop runs in a few seconds. I can't give a precise benchmark as I an running on SAS On Demand.

 

 

data have;
parent=1;component=2;output;
do i=1 to 1000000;
	c=rand("uniform");
	select;
		when (c<0.05) do;parent=component+1; component=parent+1;end; *new family;
		when (c<0.8) do;parent=parent+1; component=component+1;end; *new generation, same family;
		otherwise component=component+1;* new child, same parent;
	end;
	output;
end;
run;

View solution in original post

8 REPLIES 8
LinusH
Tourmaline | Level 20
I don't think that a macro would help here, since I believe that dynamic syntax will soulve the problem.
I had quite a similar task a couple of years back, and I used a format based on the table (source value component, formatted value patent).
And the a do while/until loop where you apply the format until you reach the top node (that is when you get the "other" value returned).

I believe you could solve this with a hash table in a similar way, but that is unfortunately not a part of my arsenal. @Ksharp to the rescue? 😉
Data never sleeps
Ksharp
Super User

Thanks @LinusH rooting for me.

The best choice is using SAS/OR if you have it, @RobPratt might give you some code.

Or check SAS/OR documentation ,I remembered there is already an example about Connection Component.

 


data have;
infile cards ;
input from $  to $ ;
cards;
1     2
1     3
4     5
5     2
9     4
6     7
8     7
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;
JohnHoughton
Quartz | Level 8

Hi @Ksharp . It may be just when I run it, but the hash table solution doesn't match the output described in the original post with @Minutemaid23's data. 

JohnHoughton
Quartz | Level 8

This works . The question will be how this performs over your 1.2 million records, and if there is a more efficient way.

 

data have;
input parent component;
datalines;
1 2
1 3
1 4
2. 5
2. 6
6. 7
;


%macro loop;
%do %until (&count=0);

	proc sql;
	create table have as
	select   coalesce(b.parent,a.parent) as parent , coalesce(a.component,b.component) as component
	from have a  left join have b on b.component=a.parent;
	run;
	* if there are no components in parent then done;
	proc sql ;
	select count(*) into :count
	from have a   join have b on b.component=a.parent;
	run;

%end;
%mend;
%loop;

 

LinusH
Tourmaline | Level 20

Only a real world test will tell, but I would be REALLY surprised if a loop of SQL joins would beat a hash table look-up, especially if the depth > 3 levels.

Data never sleeps
JohnHoughton
Quartz | Level 8

@LinusH wrote

Only a real world test will tell, but I would be REALLY surprised if a loop of SQL joins would beat a hash table look-up, especially if the depth > 3 levels

Code below generates a test data set with a million observations . The depth depends on the probabilities used in the select statement. With the numbers below I consistently get 6 levels.

 

Using this as input data , the SQL loop runs in a few seconds. I can't give a precise benchmark as I an running on SAS On Demand.

 

 

data have;
parent=1;component=2;output;
do i=1 to 1000000;
	c=rand("uniform");
	select;
		when (c<0.05) do;parent=component+1; component=parent+1;end; *new family;
		when (c<0.8) do;parent=parent+1; component=component+1;end; *new generation, same family;
		otherwise component=component+1;* new child, same parent;
	end;
	output;
end;
run;
Minutemaid23
Calcite | Level 5
Thanks so much! I am not able to run until I’m back in the office on Monday. I will provide feedback as soon as possible.
Minutemaid23
Calcite | Level 5

Well, after a lengthy battle with my IT department i discovered that we do not have a SAS metadata server that i can connect to with JMP. Is there anyway to run macro's in SAS JMP? I haven't found a way with my research so far, and i have been looking everywhere.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2249 views
  • 2 likes
  • 4 in conversation