Re: iterative join to accumulate records from a second table (linear-r...

desertsp2 · Posted 11-08-2018 04:06 PM

I have two datasets which need to be “combined” in a certain way that goes beyond a simple join/merge.

One dataset represents a roadway network consisting of links connected by nodes. Below is a representation of a roadway that splits at a “Y” intersection. A driver on link 100 can continue onto link 101 or 201. 101 is a dead-end, but 201 connects to 202, which connects to 203. The dataset is tens of thousands of rows.

LINK_ID	FROM_NODE	TO_NODE	LINK_TOTAL_LENGTH
100	1	2	1.1
101	2	3	0.8
201	2	90	0.2
202	90	91	0.4
203	91	92	1.0

The other dataset represents points on the roadway network. These will serve as starting points for trips. Here is one record that indicates a starting point 0.3 units into the link. This dataset is a few thousand rows.

START_POINT_ID	LINK_ID	DISTANCE_INTO_LINK
10001	100	0.3

My target deliverable is a third table that lists all links within a certain distance of the starting point. In this case, the distance is 1.5 units. I will need to do a bit of math to calculate the last three columns, in between each iteration.

START_POINT_ID	LINK_ID	START_POSITION_ON_LINK	END_POSITION_ON_LINK	CUMULATIVE_DISTANCE
10001	100	0.3	1.1	0.8
10001	101	0.0	0.7	1.5
10001	201	0.0	0.2	1.0
10001	202	0.0	0.4	1.4
10001	203	0.1	0.1	1.5

Conceptually simple, but I’m not sure of the best way to code this. The logical approach is to just perform datastep merges over and over (using a macro) to keep finding the next LINK(s) based on matching the nodes, until the cumulative distance reaches the specified distance (1.5 units), then start over for the next START_POINT_ID. But that requires a lot of overhead.

Is this "iterative conditional lookup" something that could be done within a single datastep?

Thanks!

desertsp2 · Posted 11-08-2018 04:33 PM

In case my example is too specific, this is probably similar to what I'm doing.

https://en.wikipedia.org/wiki/Connected_component_(graph_theory)

Edit: @Reeza to fix link

Reeza · Posted 11-08-2018 04:37 PM

This is a more complex problem, and I think you'll get the best solution if you post a really clear example. A small data set that shows what you have and EXACTLY what output would be expected from that input.

desertsp2 · Posted 11-08-2018 04:47 PM

My initial post shows the exact data and exact results. I could expand to include additional records.

Would you suggest I create dataline code so responders can easily work with actual SAS datasets?

FreelanceReinh · Posted 11-08-2018 04:47 PM

Hi @desertsp2,

Do you know PGStats's SubGraphs macro? Please see How to find all connected components in a graph. (Just as a starting point.) There are also several threads about similar problems in the community archives. "subgraphs" is a suitable keyword to find some of them.

desertsp2 · Posted 11-08-2018 05:41 PM

Thanks! I'll do some further research and see if this has already been addressed.

s_lassen · Posted 11-09-2018 04:30 AM

Something like this may work:

proc sql;
  create index from_node on links(from_node);
quit;

data want;
  merge points(in=ok) links;
  by link_id;
  if ok;
  start_position_on_link=distance_into_link;
  end_position_on_link=link_total_length;
  cumulative_distance=link_total_length-distance_into_link;
  keep start_point_id link_id start_position_on_link end_position_on_link cumulative_distance from_node to_node;
run;

data want;
  modify want;
  from_node=to_node;
  distance_so_far=cumulative_distance;
  do from_node=.,to_node;
    do until(0);
      set links key=from_node;
      if _iorc_ then leave;
      start_position_on_link=0;
      end_position_on_link=link_total_length;
      cumulative_distance=distance_so_far+link_total_length;
      output;
      end;
   end;
  _error_=0;
run;

- except that I do not understand your calculation of the cumulated distance (nor why the start_position_on_link should be 0.1 on the last obs), so I made that up.

The idea is to make an iterative travel through the tree by adding nodes in the end of the table, which will then be read iteratively. The _error_ variable is set to 0 after the loop ends, because it would create an error message every time there are no more nodes to read.

Edit: I put in the outer loop (do from_node=.,to_node) to make the datastep reread the whole set of nodes, if there happens to be a point lying on the ending link of the last link read from the previous point.

Ksharp · Posted 11-09-2018 07:40 AM

The following code could get connected component.


data have;
infile cards ;
input from $  to $ ;
cards;
1     2
1     3
4     5
5     2
9     4
6     7
8     7
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

iterative join to accumulate records from a second table (linear-referencing)?

Re: iterative join to accumulate records from a second table (linear-referencing)?

Re: iterative join to accumulate records from a second table (linear-referencing)?

Re: iterative join to accumulate records from a second table (linear-referencing)?

Re: iterative join to accumulate records from a second table (linear-referencing)?

Re: iterative join to accumulate records from a second table (linear-referencing)?

Re: iterative join to accumulate records from a second table (linear-referencing)?

Re: iterative join to accumulate records from a second table (linear-referencing)?

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!