data main; input id x y; datalines; 1 0.7 1.3 2 0.5 0.3 3 2.0 0.8 4 0.6 0.45 5 .25 .4 ; run; data ref; input id x y; datalines; 1 0.6 1.2 2 0.5 0.7 3 -0.3 1.5 ; run; proc fastclus data=main maxc=1 replace=none maxiter=0 noprint seed=ref out=mahalanobis_to_point(drop=cluster); var prin:; run; after first iteration we should have something like the mahalanobis_to_point: data set Obs id a b CLUSTER DISTANCE 1 1 0.7 1.30 1 0.20000 2 3 2.0 0.80 1 1.20830 4 4 0.6 0.45 4 0.20616 5 2 0.5 0.30 4 0.22361 6 5 0.25 0.4 5 0.32214 You then use a code like this: proc sql noprint; create table want(drop=diff) as select a.*,b.id as out_id, b.distance as diff,b.a as out_a, b.b as out_b from ref a,mahalanobis_to_point b where a.id=1 having diff=min(diff); quit; want data set will look like: Obs id a b out_id Out_a Out_b 1 1 0.6 1.2 1 0.7 1.3 In the second iteration the ref-1st data set now becomes: data ref-1st;*let’s call this data set ref-1st input id a b; datalines; 2 0.5 0.7 3 -0.3 1.5 4 0.4 0.5 ; run; This is because we have find the observation with the smallest distance to a=0.6 and b=1.2 and have remove the observation from the ref data set. We then run proc fastclus data=main maxc=1 replace=none maxiter=0 noprint seed=ref-1st out=mahalanobis_to_point(drop=cluster); var prin:; run; we again select the observation in the mahalanobis_to_point data set with smallest distance to a=0.5 b=0.7 and at it to the want data set proc sql noprint; create table want(drop=diff) as select a.*,b.id as out_id, b.distance as diff,b.a as out_a, b.b as out_b from ref-1st a,mahalanobis_to_point b where a.id=1 having diff=min(diff); quit; the growing want data set will now look like: Obs id a b out_id Out_a Out_2 1 1 0.6 1.2 1 0.7 1.3 1 2 0.5 0.7 4 0.5 0.45 We will then remove the match observation from ref-1st data set; let call the resulting data set ref-2 Ref-2 data set will look like: data ref-2;*let’s call this data set ref-2 input id a b; datalines; 3 -0.3 1.5 4 0.4 0.5 ; run; we then run the proc fastclus again proc fastclus data=main maxc=1 replace=none maxiter=0 noprint seed=ref-2 out=mahalanobis_to_point(drop=cluster); var prin:; run; This will continue until we exhaust at the observations in ref data set. That is we find the match for each of the four observation in the ref data set Observe that the data set “main” is the same throughout the iterations, while the ref data set reduces by one observation after each iteration.
... View more