I am sorry that I have not responded, I've been swamped at work so I haven't been able to test anything untill now. @Reeza) You were right, it is a Case/Control problem! I've had to modify the %dist-macro slightly, but it is just great for my needs! Thank you! @PGStats) I'm sorry, but I couldn't quite make sense of your code.. However, I used some of yours thoughts when implementing my example. @Arthur) We are predicting churn in general using other methods, which are more suited for the purpose. The main point of the program is to find customers to approach for a sale attempt. Nevertheless, I find it interested if it could be possible to use this classification of nearest neighbours in other areas as well. I've added the code for a simple example using my approach below:
filename macro '\\P-114-230-013\Arbejdsmapper\BJF\Mersalg\Macro';
%inc macro(nobs);
%inc macro(distMacro_modified);
/* Creating example dataset */
data all;
infile datalines dsd;
input Group: 1. /* 0 = customers with sales attempt. 1 = customer without sales attempt.*/
ID: 2.
Match1:2.
Match2:2.
SuccesfulSale:1.; /* Boolean to indicate whether or not a sales attempt was succesful. */
datalines;
0, 1, 7, 4, 1
0, 2, 9, 6, 0
0, 3, 22, 8, 0
0, 4, 27, 10, 1
1, 5, 5, 5, .
1, 6, 22, 7, .
1, 7, 17, 8, .
1, 8, 5, 8, .
1, 9, 8, 9, .
1, 10, 10, 11, .
1, 11, 14, 12, .
1, 12, 18, 13, .
1, 13, 21, 14, .
1, 14, 23, 15, .
0, 15, 2, 22, 0
0, 16, 9, 5, 1
0, 17, 17, 13, 1
0, 18, 29, 2, 1
0, 19, 14, 14, 0
0, 20, 4, 17, 0
proc sort; by group id; run;
/* Use slightly altered version of the distance macro */
%dist(data=all,group=group,id=id,mvars=Match1 Match2,wts=1 2.5,
out=distanceMatrix,transf=1,dist=2);
/* Outputs a matrix with the combination of each customer without sales and customer with sales including the distance between them. */
data distanceMatrixSelectNearest;
set distanceMatrix;
array idSales _C_ID1-_C_ID10;
array dist _C1-_C10;
do _i_ = 1 to 10;
distanceNearest = dist{_i_};
idNearest = idSales{_i_};
keep id idNearest distanceNearest;
output;
end;
proc sort; by id distanceNearest; run;
/* Selects the 3 nearest neighbours */
data distanceMatrixSelectNearest2;
retain nearestNeighbourCount;
set distanceMatrixSelectNearest;
by id distanceNearest;
if first.id then nearestNeighbourCount = 1; else nearestNeighbourCount = nearestNeighbourCount + 1;
if nearestNeighbourCount <= 3;
proc sort; by idNearest; run;
/* Merges sales information onto the NN matrix */
proc sort data=all out=salesAttempt (drop=group match1 match2); where group=0; by id; run;
data distanceMatrixSelectNearest3;
merge distanceMatrixSelectNearest2 (in=a)
salesAttempt (rename=id=idNearest);
by idNearest;
if a;
proc sort; by id nearestNeighbourCount; run;
/* Create a single observation for each customer without a sale attempt containing a sales prediction as well as a id and distance to NN */
data customersToCall;
retain nearestNeighbour1-nearestNeighbour3;
retain nearestNeighbourSales1-nearestNeighbourSales3;
retain nearestNeighbourDist1-nearestNeighbourDist3;
array nn_id nearestNeighbour1-nearestNeighbour3;
array nn_sale nearestNeighbourSales1-nearestNeighbourSales3;
array nn_dist nearestNeighbourDist1-nearestNeighbourDist3;
set distanceMatrixSelectNearest3;
by id;
nn_id{nearestNeighbourCount} = idNearest;
nn_dist{nearestNeighbourCount} = distanceNearest;
nn_sale{nearestNeighbourCount} = succesfulSale;
drop idNearest distanceNearest succesfulSale nearestNeighbourCount;
if last.id;
if last.id then do;
totalDist = 0;
salesPrediction = 0;
/* Find the total distance */
do _i_ = 1 to 3;
totalDist = totalDist + nn_dist{_i_};
end;
/* Add every sales attempt and take the average*/
do _i_ = 1 to 3;
salesPrediction = salesPrediction + nn_sale{_i_};
end;
salesPrediction = salesPrediction / 3;
end;
proc sort; by descending salesPrediction; run;
data customersToCall2;
set customersToCall;
by descending salesPrediction;
if _n_ <= 3;
run;
Feel free to correct me on my code and my thoughts of using this approach. Once again, thank you! Kind regards, Bjarke
... View more