The code below calls the MILP solver twice, once for each objective. The first solve uses the decomposition algorithm, which exploits the fact that cluster labels are arbitrary and hence the subproblem has identical blocks. (For an amusing application to wedding planning, see this blog post.) The second solve encourages clusters with high total node weights by maximizing the minimum total node weight per cluster.
proc optmodel;
/* declare parameters and read data */
num max_distance = 70;
set NODES;
str state {NODES};
str city {NODES};
num latitude {NODES};
num longitude {NODES};
num node_weight {NODES};
read data indata into NODES=[_N_] state city latitude longitude node_weight;
num distance {i in NODES, j in NODES: i < j} =
geodist(latitude[i], longitude[i], latitude[j], longitude[j], 'DM');
set CLUSTERS init NODES;
/* UseCluster[k] = 1 if cluster k is used, 0 otherwise */
var UseCluster {CLUSTERS} binary;
/* AssignNodeToCluster[i,k] = 1 if node i is assigned to cluster k, 0 otherwise */
var AssignNodeToCluster {NODES, CLUSTERS} binary;
/* primary objective: minimize number of used clusters */
min NumUsedClusters = sum {k in CLUSTERS} UseCluster[k];
/* assign each node to exactly one cluster */
con AssignOnce {i in NODES}:
sum {k in CLUSTERS} AssignNodeToCluster[i,k] = 1;
/* if AssignNodeToCluster[i,k] = 1 then UseCluster[k] = 1 */
con AssignToOpenCluster {i in NODES, k in CLUSTERS}:
AssignNodeToCluster[i,k] <= UseCluster[k];
/* cannot assign i and j to same cluster k if distance[i,j] > max_distance */
set CONFLICTS = {i in NODES, j in NODES: i < j and distance[i,j] > max_distance};
con Conflict {<i,j> in CONFLICTS, k in CLUSTERS}:
AssignNodeToCluster[i,k] + AssignNodeToCluster[j,k] <= UseCluster[k];
/* declare implicit variable to capture total node weight for each cluster */
impvar NodeWeightPerCluster {k in CLUSTERS} =
sum {i in NODES} node_weight[i] * AssignNodeToCluster[i,k];
/* call MILP solver to optimize primary objective */
solve with MILP / decomp=(method=set);
/* relabel clusters from 1 to the minimum number necessary */
num assigned_cluster {NODES};
for {i in NODES} do;
for {k in CLUSTERS: AssignNodeToCluster[i,k].sol > 0.5} do;
assigned_cluster[i] = k;
leave;
end;
end;
num new_cluster {1..card(NODES)};
num new_k init 0;
for {k in CLUSTERS: UseCluster[k].sol > 0.5} do;
new_k = new_k + 1;
new_cluster[k] = new_k;
end;
for {i in NODES, k in CLUSTERS} AssignNodeToCluster[i,k] = 0;
for {i in NODES} do;
assigned_cluster[i] = new_cluster[assigned_cluster[i]];
AssignNodeToCluster[i,assigned_cluster[i]] = 1;
end;
CLUSTERS = 1..new_k;
for {k in CLUSTERS} fix UseCluster[k] = 1;
/* secondary objective: maximize minimum node weight across clusters */
var MinNodeWeight >= 0 init min {k in CLUSTERS} NodeWeightPerCluster[k].sol;
max MaxMin = MinNodeWeight;
con MaxMinDef {k in CLUSTERS}:
MinNodeWeight <= NodeWeightPerCluster[k];
/* call MILP solver to optimize secondary objective */
solve with MILP / primalin;
for {i in NODES} do;
for {k in CLUSTERS: AssignNodeToCluster[i,k].sol > 0.5} do;
assigned_cluster[i] = k;
leave;
end;
end;
/* save solution to SAS data set */
create data outdata from [node] state city latitude longitude node_weight cluster=assigned_cluster;
quit;
/* print and plot solution by cluster */
proc sort data=outdata;
by cluster;
run;
proc print data=outdata;
by cluster;
sum node_weight;
run;
proc sgplot data=outdata;
scatter y=latitude x=longitude / datalabel=city group=cluster;
run;
On my machine running SAS/OR 14.2, I get the following 15 clusters in about 2 seconds:
cluster=1
Obs
node
state
city
latitude
longitude
node_weight
1
1
WV
Gassaway
38.6732
-80.7748
30
2
2
WV
Quinwood
38.0576
-80.7068
40
cluster
70
cluster=2
Obs
node
state
city
latitude
longitude
node_weight
3
3
WV
Charleston
38.3498
-81.6326
55
4
4
WV
Kenova
38.3990
-82.5782
25
5
5
WV
Crum
37.9057
-82.4460
35
cluster
115
cluster=3
Obs
node
state
city
latitude
longitude
node_weight
6
6
AR
Fort Smith
35.3859
-94.3985
114
cluster=4
Obs
node
state
city
latitude
longitude
node_weight
7
7
IN
Indianapolis
39.7684
-86.1581
28
cluster=5
Obs
node
state
city
latitude
longitude
node_weight
8
8
IN
Jasper
38.3914
-86.9311
90
9
9
IN
New Albany
38.2856
-85.8241
34
10
10
IN
Saint Meinrad
38.1711
-86.8092
67
11
11
IN
Scottsburg
38.6856
-85.7702
260
12
12
KY
Louisville
38.2527
-85.7585
299
cluster
750
cluster=6
Obs
node
state
city
latitude
longitude
node_weight
13
13
KY
Winchester
37.9901
-84.1797
51
cluster=7
Obs
node
state
city
latitude
longitude
node_weight
14
14
NC
Charlotte
35.2271
-80.8431
294
cluster=8
Obs
node
state
city
latitude
longitude
node_weight
15
16
NC
Horse Shoe
35.343
-82.5567
49
cluster=9
Obs
node
state
city
latitude
longitude
node_weight
16
15
NC
High Point
35.9557
-80.0053
76
17
17
NC
Kernersville
36.1199
-80.0737
150
18
18
NC
Liberty
35.8535
-79.5717
89
cluster
315
cluster=10
Obs
node
state
city
latitude
longitude
node_weight
19
20
NJ
Secaucus
40.7895
-74.0565
202
20
21
NJ
South River
40.4465
-74.3860
56
cluster
258
cluster=11
Obs
node
state
city
latitude
longitude
node_weight
21
22
NY
Geneva
42.868
-76.9856
51
cluster=12
Obs
node
state
city
latitude
longitude
node_weight
22
23
OH
Grove City
39.8815
-83.0930
46
23
24
OH
Groveport
39.8532
-82.8883
63
cluster
109
cluster=13
Obs
node
state
city
latitude
longitude
node_weight
24
25
OH
Youngstown
41.0998
-80.6495
55
cluster=14
Obs
node
state
city
latitude
longitude
node_weight
25
26
PA
Corry
41.9203
-79.6403
35
cluster=15
Obs
node
state
city
latitude
longitude
node_weight
26
19
NC
Statesville
35.7826
-80.8873
56
27
27
VA
Galax
36.6612
-80.9240
43
cluster
99
2393
You can see that Charleston ends up in the "wrong" cluster. The maximin approach essentially ignores all clusters whose total node weights are higher than the minimum, which is 28 in this case.
By the way, how many locations are in your real data?
... View more