topic Re: Nearest Neighbor in SAS Data Science

Nearest Neighbor

sinmathstat — Sun, 15 May 2016 04:25:37 GMT

Hi all,

I am new to this forum.

I have the following problem in SAS EM. The neighbors out of MBR node have wrong orders. To illustrate the problem I wrote a simple program and I used proc pmbr to do the calculations.

data t1;
input y x1 x2;
id = _n_;
cards;
1 12 14
1 11 10
0 3 4
0 5 2
;
data t2;
input y x1 x2;

cards;
1 10 12
1 12 12
0 2 1
;
run;

proc dmdb data=t1 dmdbcat=work.temp;
	var  x1 x2;
	class y;
run;

proc pmbr data=t1 dmdbcat=work.temp k=1 method=scan outest=t1_out 
		neighbors ;
	target y;
	id id;
	score outfit=t2_fit data=t2 out=t2_out role=validation;
	
run;

proc print data=t2_out;
run;

proc pmbr data=t1 dmdbcat=work.temp k=2 method=scan outest=t1_out 
		neighbors ;
	target y;
	id id;
	score outfit=t2_fit data=t2 out=t2_out role=validation;
	
run;

proc print data=t2_out;
run;

proc pmbr data=t1 dmdbcat=work.temp k=3 method=scan outest=t1_out 
		neighbors ;
	target y;
	id id;
	score outfit=t2_fit data=t2 out=t2_out role=validation;
	
run;

proc print data=t2_out;
run;

as you see from the output the orders of neighbors are not correct, i.e. in the first output _n1 is 2 but the second output _n1 is 1. How I can produce the values of _n: in such a way that _n1 shows the first nn , _n2 shows the second nn, ...?

Thanks.

Re: Nearest Neighbor

Reeza — Sun, 15 May 2016 05:05:31 GMT

I can't run those procs because I don't have EM, but couldn't you just sort the output dataset?

Re: Nearest Neighbor

WendyCzika — Mon, 16 May 2016 17:01:54 GMT

I don't believe there is any ordering implied by the columns _N1, _N2,.. They are just showing the top K nearest neighbors, not necessarily ordered by the nearest to farthest since they all have equal weight when scoring.

Re: Nearest Neighbor

sinmathstat — Tue, 17 May 2016 21:54:55 GMT

Thanks for your reply.

I see.

However wondering is there any technical difficulty (or benefit to not preserve ordering) for SAS to save _N1,_N2,... while preserving the orders?

Because the ordered _N1, _N2,... has some benefits, e.g. customized weighted NN and/or easy way to find the optimum K, ...

Re: Nearest Neighbor

sinmathstat — Tue, 17 May 2016 21:55:45 GMT

thanks for your reply.

unfortunately it is not possible to sort them.