Hi SAS community,
It is with great interest that I read about the SAS PROC Optimization to visit all baseball stadiums, similar to the TSP (Traveling salesman problem).
I have the following data of 106 nodes (i.e., stations) and their connection via 103 unique arcs, like in a subway network, and I am interested in determining the optimal route to pass through each station, by minimizing the number of times one has to revisit. For simplicity, I could start my trip at I27 as a source sink.
Would anyone know how to do this using the SAS optimization language. I am interested in learning this as a base, then adding complexity such as scheudling, etc.
Also, can these optimization tools be run using SAS Enterprise Guide, or is it Viya only?
Thank you very much for your time,
-Carmine
data original_data;
input unique_arc $ node_1 $ node_2 $;
datalines;
1 I27 I26
2 I26 I25
3 I25 I24
4 I24 I23
5 I23 I22
6 I22 I21
7 I21 I20
8 I20 I19
9 I19 I18
10 I18 I17
11 I17 I16
12 I16 I15
13 I15 I14
14 I14 I13
15 I13 I12_E7
16 I12_E7 I11
17 I11 I10_S6
18 I10_S6 I9
19 I9 I8
20 I8 I7
21 I7 I6
22 I6 I5
23 I5 I4_A8
24 I4_A8 I3
25 I3 I2
26 I2 I1
27 A20 A19
28 A19 A18
29 A18 A17_E11
30 A17_E11 A16
31 A16 A15_S9
32 A15_S9 A14
33 A14 A13
34 A13 A12
35 A12 A11
36 A11 A10
37 A10 A9_E20
38 A9_E20 I4_A8
39 I4_A8 A7
40 A7 A6
41 A6 A5
42 A5 A4
43 A4 A3
44 A3 A2
45 A2 A1
46 S21 S20
47 S20 S19
48 S19 S18
49 S18 S17
50 S17 S16
51 S16 S15
52 S15 S14
53 S14 S13
54 S13 S12
55 S12 S11_E13
56 S11_E13 S10
57 S10 A15_S9
58 A15_S9 S8
59 S8 S7
60 S7 I10_S6
61 I10_S6 S5
62 S5 S4
63 S4 S3
64 S3 S2
65 S2 S1_E27
66 E38 E37
67 E37 E36
68 E36 E35
69 E35 E34
70 E34 E33
71 E33 E32
72 E32 E31
73 E31 E30
74 E30 E29
75 E29 E28
76 E28 S1_E27
77 S1_E27 E26
78 E26 E25
79 E25 E24
80 E24 E23
81 E23 E22
82 E22 E21
83 E21 A9_E20
84 A9_E20 E19
85 E19 E18
86 E18 E17
87 E17 E16
88 E16 E15
89 E15 E14
90 E14 S11_E13
91 S11_E13 E12
92 E12 A17_E11
93 A17_E11 E10
94 E10 E9
95 E9 E8
96 E8 I12_E7
97 I12_E7 E6
98 E6 E5
99 E5 E4
100 E4 E3
101 E3 E2
102 E2 E1
103 E1 E28
;
run;
You want to find the minimum number of edges that need to be duplicated to make the graph Eulerian, that is, to make every node have even degree. This problem is known as the "Chinese postman" problem or "route inspection" problem.
The following PROC OPTMODEL code (which works in both SAS 9 and SAS Viya) uses the network solver to find shortest-path distances for all pairs of odd-degree nodes and then uses the MILP solver to find an optimal matching in the complete graph of odd-degree nodes. The edges that appear in the shortest paths for the matches are the ones that should be duplicated.
proc optmodel;
/* define graph */
set <str,str> EDGES;
read data original_data into EDGES=[node_1 node_2];
put (card(EDGES))=;
set NODES = union {<i,j> in EDGES} {i,j};
put (card(NODES))=;
num deg {NODES} init 0;
for {<i,j> in EDGES} do;
deg[i] = deg[i] + 1;
deg[j] = deg[j] + 1;
end;
set ODD_NODES = {i in NODES: mod(deg[i],2) = 1};
put ODD_NODES=;
put (card(ODD_NODES))=;
/* find shortest-path distances for all pairs of odd nodes */
num spweights {ODD_NODES, ODD_NODES} init .;
solve with network /
shortpath=(source=ODD_NODES sink=ODD_NODES) links=(include=EDGES) out=(spweights=spweights);
print spweights;
set ODD_PAIRS = {i in ODD_NODES, j in ODD_NODES: i < j and spweights[i,j] ne .};
put (card(ODD_PAIRS))=;
set <str,str> ODD_PAIRS_node {ODD_NODES} init {};
for {<i,j> in ODD_PAIRS} do;
ODD_PAIRS_node[i] = ODD_PAIRS_node[i] union {<i,j>};
ODD_PAIRS_node[j] = ODD_PAIRS_node[j] union {<i,j>};
end;
/* find optimal matching in complete graph of odd nodes */
/* IsMatch[i,j] = 1 if odd pair <i,j> is matched, 0 otherwise */
var IsMatch {ODD_PAIRS} binary;
/* minimize number of duplicated edges */
min NumDuplicatedEdges = sum {<i,j> in ODD_PAIRS} spweights[i,j] * IsMatch[i,j];
/* force every odd node to be matched */
con Matching {k in ODD_NODES}:
sum {<i,j> in ODD_PAIRS_node[k]} IsMatch[i,j] = 1;
/* call MILP solver */
solve;
set MATCHES = {<i,j> in ODD_PAIRS: IsMatch[i,j].sol > 0.5};
put MATCHES=;
put (card(MATCHES))=;
/* retrieve actual shortest paths only for matched pairs */
str source, sink;
set <str,str> SP_EDGES {MATCHES};
set <str,str,num,str,str> SP_PATHS; /* <source,sink,order,from,to> */
set SOURCES = {source};
set SINKS = {sink};
reset option printlevel=0;
cofor {<i,j> in MATCHES} do;
source = i;
sink = j;
solve with network / loglevel=0
shortpath=(source=SOURCES sink=SINKS) links=(include=EDGES) out=(sppaths=SP_PATHS);
SP_EDGES[i,j] = setof {<(i),(j),order,from,to> in SP_PATHS} <from,to>;
end;
reset option printlevel=1;
/* update degrees */
set EDGES_SOL = union {<i,j> in MATCHES} SP_EDGES[i,j];
put (card(EDGES_SOL))=;
num new_deg {i in NODES} init deg[i];
for {<i,j> in EDGES_SOL} do;
new_deg[i] = new_deg[i] + 1;
new_deg[j] = new_deg[j] + 1;
end;
/* create SAS data sets */
create data node_data_out from [i]=NODES deg degmod2=(mod(deg[i],2)) new_deg;
create data edge_data_out from [i j]=EDGES_SOL;
quit;
Your network has 98 nodes (not 106) and 103 edges, and the following 8 nodes have odd degree:
ODD_NODES={'I27','I1','A20','A1','S21','S1_E27','E38','E28'}
The shortest-path distances are:
spweights | ||||||||
---|---|---|---|---|---|---|---|---|
A1 | A20 | E28 | E38 | I1 | I27 | S1_E27 | S21 | |
A1 | . | 19 | 16 | 26 | 10 | 30 | 15 | 25 |
A20 | 19 | . | 14 | 24 | 15 | 22 | 13 | 15 |
E28 | 16 | 14 | . | 10 | 12 | 22 | 1 | 21 |
E38 | 26 | 24 | 10 | . | 22 | 32 | 11 | 31 |
I1 | 10 | 15 | 12 | 22 | . | 26 | 11 | 21 |
I27 | 30 | 22 | 22 | 32 | 26 | . | 22 | 31 |
S1_E27 | 15 | 13 | 1 | 11 | 11 | 22 | . | 20 |
S21 | 25 | 15 | 21 | 31 | 21 | 31 | 20 | . |
An optimal matching, with weight 57, is:
MATCHES={<'I27','S1_E27'>,<'A20','S21'>,<'A1','I1'>,<'E28','E38'>}
(The pair <'E28','S1_E27'> is tempting because it has weight 1, but the optimal matching that includes that pair turns out to have weight 58 > 57.)
The node_data_out data set reports both the old and new degrees, and the edge_data_out data set contains only the 57 edges that need duplication.
If you have edge weights (like distances between stations), the same approach can be used to minimize the total duplicated edge weight instead of just the number of duplicated edges.
You want to find the minimum number of edges that need to be duplicated to make the graph Eulerian, that is, to make every node have even degree. This problem is known as the "Chinese postman" problem or "route inspection" problem.
The following PROC OPTMODEL code (which works in both SAS 9 and SAS Viya) uses the network solver to find shortest-path distances for all pairs of odd-degree nodes and then uses the MILP solver to find an optimal matching in the complete graph of odd-degree nodes. The edges that appear in the shortest paths for the matches are the ones that should be duplicated.
proc optmodel;
/* define graph */
set <str,str> EDGES;
read data original_data into EDGES=[node_1 node_2];
put (card(EDGES))=;
set NODES = union {<i,j> in EDGES} {i,j};
put (card(NODES))=;
num deg {NODES} init 0;
for {<i,j> in EDGES} do;
deg[i] = deg[i] + 1;
deg[j] = deg[j] + 1;
end;
set ODD_NODES = {i in NODES: mod(deg[i],2) = 1};
put ODD_NODES=;
put (card(ODD_NODES))=;
/* find shortest-path distances for all pairs of odd nodes */
num spweights {ODD_NODES, ODD_NODES} init .;
solve with network /
shortpath=(source=ODD_NODES sink=ODD_NODES) links=(include=EDGES) out=(spweights=spweights);
print spweights;
set ODD_PAIRS = {i in ODD_NODES, j in ODD_NODES: i < j and spweights[i,j] ne .};
put (card(ODD_PAIRS))=;
set <str,str> ODD_PAIRS_node {ODD_NODES} init {};
for {<i,j> in ODD_PAIRS} do;
ODD_PAIRS_node[i] = ODD_PAIRS_node[i] union {<i,j>};
ODD_PAIRS_node[j] = ODD_PAIRS_node[j] union {<i,j>};
end;
/* find optimal matching in complete graph of odd nodes */
/* IsMatch[i,j] = 1 if odd pair <i,j> is matched, 0 otherwise */
var IsMatch {ODD_PAIRS} binary;
/* minimize number of duplicated edges */
min NumDuplicatedEdges = sum {<i,j> in ODD_PAIRS} spweights[i,j] * IsMatch[i,j];
/* force every odd node to be matched */
con Matching {k in ODD_NODES}:
sum {<i,j> in ODD_PAIRS_node[k]} IsMatch[i,j] = 1;
/* call MILP solver */
solve;
set MATCHES = {<i,j> in ODD_PAIRS: IsMatch[i,j].sol > 0.5};
put MATCHES=;
put (card(MATCHES))=;
/* retrieve actual shortest paths only for matched pairs */
str source, sink;
set <str,str> SP_EDGES {MATCHES};
set <str,str,num,str,str> SP_PATHS; /* <source,sink,order,from,to> */
set SOURCES = {source};
set SINKS = {sink};
reset option printlevel=0;
cofor {<i,j> in MATCHES} do;
source = i;
sink = j;
solve with network / loglevel=0
shortpath=(source=SOURCES sink=SINKS) links=(include=EDGES) out=(sppaths=SP_PATHS);
SP_EDGES[i,j] = setof {<(i),(j),order,from,to> in SP_PATHS} <from,to>;
end;
reset option printlevel=1;
/* update degrees */
set EDGES_SOL = union {<i,j> in MATCHES} SP_EDGES[i,j];
put (card(EDGES_SOL))=;
num new_deg {i in NODES} init deg[i];
for {<i,j> in EDGES_SOL} do;
new_deg[i] = new_deg[i] + 1;
new_deg[j] = new_deg[j] + 1;
end;
/* create SAS data sets */
create data node_data_out from [i]=NODES deg degmod2=(mod(deg[i],2)) new_deg;
create data edge_data_out from [i j]=EDGES_SOL;
quit;
Your network has 98 nodes (not 106) and 103 edges, and the following 8 nodes have odd degree:
ODD_NODES={'I27','I1','A20','A1','S21','S1_E27','E38','E28'}
The shortest-path distances are:
spweights | ||||||||
---|---|---|---|---|---|---|---|---|
A1 | A20 | E28 | E38 | I1 | I27 | S1_E27 | S21 | |
A1 | . | 19 | 16 | 26 | 10 | 30 | 15 | 25 |
A20 | 19 | . | 14 | 24 | 15 | 22 | 13 | 15 |
E28 | 16 | 14 | . | 10 | 12 | 22 | 1 | 21 |
E38 | 26 | 24 | 10 | . | 22 | 32 | 11 | 31 |
I1 | 10 | 15 | 12 | 22 | . | 26 | 11 | 21 |
I27 | 30 | 22 | 22 | 32 | 26 | . | 22 | 31 |
S1_E27 | 15 | 13 | 1 | 11 | 11 | 22 | . | 20 |
S21 | 25 | 15 | 21 | 31 | 21 | 31 | 20 | . |
An optimal matching, with weight 57, is:
MATCHES={<'I27','S1_E27'>,<'A20','S21'>,<'A1','I1'>,<'E28','E38'>}
(The pair <'E28','S1_E27'> is tempting because it has weight 1, but the optimal matching that includes that pair turns out to have weight 58 > 57.)
The node_data_out data set reports both the old and new degrees, and the edge_data_out data set contains only the 57 edges that need duplication.
If you have edge weights (like distances between stations), the same approach can be used to minimize the total duplicated edge weight instead of just the number of duplicated edges.
I think "I12_E7" means two nodes a.k.a
E8 I12_E7
----->
E8 I12
E8 E7
Only do that you can get 106 nodes.
Try dataset "original_data2" .
P.S. OP want start from node "I27" .
data original_data;
input unique_arc $ node_1 $ node_2 $;
datalines;
1 I27 I26
2 I26 I25
3 I25 I24
4 I24 I23
5 I23 I22
6 I22 I21
7 I21 I20
8 I20 I19
9 I19 I18
10 I18 I17
11 I17 I16
12 I16 I15
13 I15 I14
14 I14 I13
15 I13 I12_E7
16 I12_E7 I11
17 I11 I10_S6
18 I10_S6 I9
19 I9 I8
20 I8 I7
21 I7 I6
22 I6 I5
23 I5 I4_A8
24 I4_A8 I3
25 I3 I2
26 I2 I1
27 A20 A19
28 A19 A18
29 A18 A17_E11
30 A17_E11 A16
31 A16 A15_S9
32 A15_S9 A14
33 A14 A13
34 A13 A12
35 A12 A11
36 A11 A10
37 A10 A9_E20
38 A9_E20 I4_A8
39 I4_A8 A7
40 A7 A6
41 A6 A5
42 A5 A4
43 A4 A3
44 A3 A2
45 A2 A1
46 S21 S20
47 S20 S19
48 S19 S18
49 S18 S17
50 S17 S16
51 S16 S15
52 S15 S14
53 S14 S13
54 S13 S12
55 S12 S11_E13
56 S11_E13 S10
57 S10 A15_S9
58 A15_S9 S8
59 S8 S7
60 S7 I10_S6
61 I10_S6 S5
62 S5 S4
63 S4 S3
64 S3 S2
65 S2 S1_E27
66 E38 E37
67 E37 E36
68 E36 E35
69 E35 E34
70 E34 E33
71 E33 E32
72 E32 E31
73 E31 E30
74 E30 E29
75 E29 E28
76 E28 S1_E27
77 S1_E27 E26
78 E26 E25
79 E25 E24
80 E24 E23
81 E23 E22
82 E22 E21
83 E21 A9_E20
84 A9_E20 E19
85 E19 E18
86 E18 E17
87 E17 E16
88 E16 E15
89 E15 E14
90 E14 S11_E13
91 S11_E13 E12
92 E12 A17_E11
93 A17_E11 E10
94 E10 E9
95 E9 E8
96 E8 I12_E7
97 I12_E7 E6
98 E6 E5
99 E5 E4
100 E4 E3
101 E3 E2
102 E2 E1
103 E1 E28
;
run;
data original_data2;
set original_data;
do i=1 to countw(node_1,'_');
_start=scan(node_1,i,'_');
do j=1 to countw(node_2,'_');
_end=scan(node_2,j,'_');
output;
end;
end;
keep _start _end;
run;
For the new network, there are 106 nodes and 135 edges. If you rename the data set to original_data and the variables to node_1 and node_2, the same PROC OPTMODEL code yields 32 odd-degree nodes and a minimum of 63 duplicated edges.
Regarding the starting node, note that the output just tells you which edges to duplicate. To generate an Eulerian cycle for the resulting Eulerian (multi)graph, you can start wherever you want and just keep traversing edges in any order, marking them as done as you go. Because the node degrees are even, you will never get stuck and instead will end up at the node where you started.
Thank you. You are indeed correct with the number of nodes. I will try to learn and understand the syntax to adapt to other problems. The arcs and nodes represent the Toei Subway in Tokyo.
Thank you again for your help!
-Carmine
The relaxation you describe is the difference between Eulerian cycle and Eulerian path. Instead of matching all odd-degree nodes, you can leave two unmatched, and these become the endpoints of the Eulerian path. You can capture this variant with the following minor code changes:
* con Matching {k in ODD_NODES}:
sum {<i,j> in ODD_PAIRS_node[k]} IsMatch[i,j] = 1;
var IsMatched {ODD_NODES} binary;
con Matching {k in ODD_NODES}:
sum {<i,j> in ODD_PAIRS_node[k]} IsMatch[i,j] = IsMatched[k];
con AtMostTwoUnmatched:
sum {k in ODD_NODES} (1 - IsMatched[k]) <= 2;
The resulting matches are now:
MATCHES={<'A20','S21'>,<'A1','I1'>,<'E28','S1_E27'>}
The optimal objective value is now 26 instead of 57, and the two unmatched nodes are E38 and I27.
Thank you Rob for sharing! This is very useful. I am excited to start applying this to additional data I have and learn the syntax.
Have a great day,
-Carmine
Hi
You also can use network solver for TSP. This approach more efficient.
data Cities; input city $20. lat long; datalines; Albany, NY 42.652552778 -73.75732222 Annapolis, MD 38.978611111 -76.49111111 Atlanta, GA 33.749272222 -84.38826111 Augusta, ME 44.307236111 -69.78167778 Austin, TX 30.274722222 -97.74055556 Baton Rouge, LA 30.457072222 -91.18740556 Bismarck, ND 46.820813889 -100.7827417 Boise, ID 43.617697222 -116.1996139 Boston, MA 42.357708333 -71.06356389 Carson City, NV 39.164075 -119.7662917 Charleston, WV 38.336388889 -81.61222222 Cheyenne, WY 41.140277778 -104.8197222 Columbia, SC 34.000433333 -81.03314722 Columbus, OH 39.961388889 -82.99888889 Concord, NH 43.206747222 -71.53812778 Denver, CO 39.739094444 -104.9848972 Des Moines, IA 41.591177778 -93.60386944 Dover, DE 39.157305556 -75.51972222 Frankfort, KY 38.186777778 -84.87533333 Harrisburg, PA 40.264444444 -76.86666667 Hartford, CT 41.764136111 -72.68277778 Helena, MT 46.5857 -112.0184 Indianapolis, IN 39.768611111 -86.1625 Jackson, MS 32.303888889 -90.18222222 Jefferson City, MO 38.579119444 -92.17299167 Lansing, MI 42.733727778 -84.55558889 Lincoln, NE 40.808088889 -96.69958611 Little Rock, AR 34.746758333 -92.28876111 Madison, WI 43.074444444 -89.38472222 Montgomery, AL 32.377447222 -86.30094167 Montpelier, VT 44.262222222 -72.58083333 Nashville, TN 36.165833333 -86.78416667 Oklahoma City, OK 35.492280556 -97.50337222 Olympia, WA 47.035277778 -122.9063889 Phoenix, AZ 33.448097222 -112.0970944 Pierre, SD 44.367166667 -100.3463528 Providence, RI 41.830833333 -71.415 Raleigh, NC 35.780277778 -78.63916667 Richmond, VA 37.538758333 -77.43359444 Sacramento, CA 38.576572222 -121.4934111 Saint Paul, MN 44.955147222 -93.10223611 Salem, OR 44.938730556 -123.0300972 Salt Lake City, UT 40.777222222 -111.8880556 Santa Fe, NM 35.682280556 -105.9396583 Springfield, IL 39.798516667 -89.65488889 Tallahassee, FL 30.438111111 -84.2816 Topeka, KS 39.048008333 -95.67815556 Trenton, NJ 40.220436111 -74.76990278 Washington, DC 38.889802778 -77.00911389 ;
From this list, you can generate a links data set, CitiesDist
, that contains the distances (in miles) between each pair of cities. The distances are calculated by using the SAS function GEODIST.
/* create a list of all the possible pairs of cities */ proc sql; create table CitiesDist as select a.city as city1, a.lat as lat1, a.long as long1, b.city as city2, b.lat as lat2, b.long as long2, geodist(lat1, long1, lat2, long2, 'DM') as distance from Cities as a, Cities as b where a.city < b.city; quit;
The following PROC OPTMODEL statements find the optimal tour of all the capital cities:
/* find optimal tour by using the network solver */ proc optmodel; set<str,str> CAPPAIRS; set<str> CAPITALS = union {<i,j> in CAPPAIRS} {i,j}; num distance{i in CAPITALS, j in CAPITALS: i < j}; read data CitiesDist into CAPPAIRS=[city1 city2] distance; set<str,str> TOUR; num order{CAPITALS}; solve with NETWORK / loglevel = moderate links = (weight=distance) tsp out = (order=order tour=TOUR) ; put (sum{<i,j> in TOUR} distance[i,j]); /* create tour-ordered pairs (rather than input-ordered pairs) */ str CAPbyOrder{1..card(CAPITALS)}; for {i in CAPITALS} CAPbyOrder[order[i]] = i; set TSPEDGES init setof{i in 2..card(CAPITALS)} <CAPbyOrder[i-1],CAPbyOrder[i]> union {<CAPbyOrder[card(CAPITALS)],CAPbyOrder[1]>}; num distance2{<i,j> in TSPEDGES} = if i < j then distance[i,j] else distance[j,i]; create data TSPTourNodes from [node] tsp_order=order; create data TSPTourLinks from [city1 city2]=TSPEDGES distance=distance2; quit;
Hi Rob
Thank you for response. Here two different problems, sure. My main point about method.
You provided solutions for MILP initially. For MILP, this task will be very unpleasant) I would use network or local search solver.
For problems that the network solver supports, I agree that the specialized algorithms will generally outperform the generic MILP solver. After all, that's one of the biggest motivations to have the network solver. But the network solver does not currently support an option to solve the Chinese postman problem. Fortunately, the PROC OPTMODEL code I showed that uses the network solver to find shortest paths and the MILP solver to find an optimal matching solves the Chinese postman problem exactly in less than 1 second for the provided data.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.