BookmarkSubscribeRSS Feed
desertsp2
Calcite | Level 5

I have two datasets which need to be “combined” in a certain way that goes beyond a simple join/merge.

 

One dataset represents a roadway network consisting of links connected by nodes. Below is a representation of a roadway that splits at a “Y” intersection. A driver on link 100 can continue onto link 101 or 201. 101 is a dead-end, but 201 connects to 202, which connects to 203. The dataset is tens of thousands of rows.

LINK_ID

FROM_NODE

TO_NODE

LINK_TOTAL_LENGTH

100

1

2

1.1

101

2

3

0.8

201

2

90

0.2

202

90

91

0.4

203

91

92

1.0

 

 

The other dataset represents points on the roadway network. These will serve as starting points for trips. Here is one record that indicates a starting point 0.3 units into the link. This dataset is a few thousand rows.

START_POINT_ID

LINK_ID

DISTANCE_INTO_LINK

10001

100

0.3

 

 

My target deliverable is a third table that lists all links within a certain distance of the starting point. In this case, the distance is 1.5 units. I will need to do a bit of math to calculate the last three columns, in between each iteration.

START_POINT_ID

LINK_ID

START_POSITION_ON_LINK

END_POSITION_ON_LINK

CUMULATIVE_DISTANCE

10001

100

0.3

1.1

0.8

10001

101

0.0

0.7

1.5

10001

201

0.0

0.2

1.0

10001

202

0.0

0.4

1.4

10001

203

0.1

0.1

1.5

 

 

Conceptually simple, but I’m not sure of the best way to code this. The logical approach is to just perform datastep merges over and over (using a macro) to keep finding the next LINK(s) based on matching the nodes, until the cumulative distance reaches the specified distance (1.5 units), then start over for the next START_POINT_ID. But that requires a lot of overhead.

 

Is this "iterative conditional lookup" something that could be done within a single datastep?

 

Thanks!

7 REPLIES 7
desertsp2
Calcite | Level 5

In case my example is too specific, this is probably similar to what I'm doing.

 

https://en.wikipedia.org/wiki/Connected_component_(graph_theory)

 

 

Edit: @Reeza to fix link
Reeza
Super User
This is a more complex problem, and I think you'll get the best solution if you post a really clear example. A small data set that shows what you have and EXACTLY what output would be expected from that input.
desertsp2
Calcite | Level 5

My initial post shows the exact data and exact results. I could expand to include additional records.

 

Would you suggest I create dataline code so responders can easily work with actual SAS datasets?

FreelanceReinh
Jade | Level 19

Hi @desertsp2,

 

Do you know PGStats's SubGraphs macro? Please see How to find all connected components in a graph. (Just as a starting point.) There are also several threads about similar problems in the community archives. "subgraphs" is a suitable keyword to find some of them.

desertsp2
Calcite | Level 5

Thanks! I'll do some further research and see if this has already been addressed.

s_lassen
Meteorite | Level 14

Something like this may work:

proc sql;
  create index from_node on links(from_node);
quit;

data want;
  merge points(in=ok) links;
  by link_id;
  if ok;
  start_position_on_link=distance_into_link;
  end_position_on_link=link_total_length;
  cumulative_distance=link_total_length-distance_into_link;
  keep start_point_id link_id start_position_on_link end_position_on_link cumulative_distance from_node to_node;
run;

data want;
  modify want;
  from_node=to_node;
  distance_so_far=cumulative_distance;
do from_node=.,to_node; do until(0); set links key=from_node; if _iorc_ then leave; start_position_on_link=0; end_position_on_link=link_total_length; cumulative_distance=distance_so_far+link_total_length; output; end;
end; _error_=0; run;

- except that I do not understand your calculation of the cumulated distance (nor why the start_position_on_link should be 0.1 on the last obs), so I made that up.

 

The idea is to make an iterative travel through the tree by adding nodes in the end of the table, which will then be read iteratively. The _error_ variable is set to 0 after the loop ends, because it would create an error message every time there are no more nodes to read.

 

Edit: I put in the outer loop (do from_node=.,to_node) to make the datastep reread the whole set of nodes, if there happens to be a point lying on the ending link of the last link read from the previous point.

Ksharp
Super User

The following code could get connected component.

 


data have;
infile cards ;
input from $  to $ ;
cards;
1     2
1     3
4     5
5     2
9     4
6     7
8     7
;
run;
data full;
  set have end=last;
  if _n_ eq 1 then do;
   declare hash h();
    h.definekey('node');
     h.definedata('node');
     h.definedone();
  end;
  output;
  node=from; h.replace();
  from=to; to=node;
  output;
  node=from; h.replace();
  if last then h.output(dataset:'node');
  drop node;
run;


data want(keep=node household);
declare hash ha(ordered:'a');
declare hiter hi('ha');
ha.definekey('count');
ha.definedata('last');
ha.definedone();
declare hash _ha(hashexp: 20);
_ha.definekey('key');
_ha.definedone();

if 0 then set full;
declare hash from_to(dataset:'full(where=(from is not missing and to is not missing))',hashexp:20,multidata:'y');
 from_to.definekey('from');
 from_to.definedata('to');
 from_to.definedone();

if 0 then set node;
declare hash no(dataset:'node');
declare hiter hi_no('no');
 no.definekey('node');
 no.definedata('node');
 no.definedone();
 

do while(hi_no.next()=0);
 household+1; output;
 count=1;
 key=node;_ha.add();
 last=node;ha.add();
 rc=hi.first();
 do while(rc=0);
   from=last;rx=from_to.find();
   do while(rx=0);
     key=to;ry=_ha.check();
      if ry ne 0 then do;
       node=to;output;rr=no.remove(key:node);
       key=to;_ha.add();
       count+1;
       last=to;ha.add();
      end;
      rx=from_to.find_next();
   end;
   rc=hi.next();
end;
ha.clear();_ha.clear();
end;
stop;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2175 views
  • 2 likes
  • 5 in conversation