Solved: Re: Nearest neighbor in a data set

umeshgiri48 · Posted 04-02-2019 03:28 AM

Hi,

I would like to implement a nearest neighbors algorithm. More specifically, I have more than 300000 customers and I need to find nearest 100 customers 50 above them and 50 below them on the basis of their Latitude value (variable name) which i have sorted in ascending, suppose customer who is in 1st row then he has 0 customers above him so his closest 100 customers will be 100 below them i.e 1-100 and like wise customer who is on 51st row then his above 50 will be 1-50 customers and below will be 51-100, like wise it will process for all the 300000 customers and a new data set will be created by appending all the data set in one which will be of 300000*100.

i am importing the file and then sorting the data on the basis of Latitude value and then assigning the row number after that i am helpless.

kind regards

proc sort data = sample; by Latitude; run;
data sample;                                                                                                                                           
  set sample ;                                                                                                                                         
  row_number=_n_;                                                                                                                                          
run;

Rick_SAS · Posted 04-02-2019 06:42 AM

Use PROC MODECLUS. The NEIGHBOR option on the PROC MODECLUS statement produces a table that gives the observation number (or ID value) of nearest neighbors. For example, the following statements produce the observation numbers for the nearest neighbors:

/* Use K=p option to find nearest p-1 neighbors */
proc modeclus data=Sample method=1 k=101 Neighbor; /* nearest 100 nbrs */
var x y z w;
run;

I suggest you start with a smaller problem, such as the nearest 2 neighbors, before attempting the large problem. The MODECLUS doc has an example for nearest neighbors.

View solution in original post

Rick_SAS · Posted 04-02-2019 06:42 AM

Use PROC MODECLUS. The NEIGHBOR option on the PROC MODECLUS statement produces a table that gives the observation number (or ID value) of nearest neighbors. For example, the following statements produce the observation numbers for the nearest neighbors:

/* Use K=p option to find nearest p-1 neighbors */
proc modeclus data=Sample method=1 k=101 Neighbor; /* nearest 100 nbrs */
var x y z w;
run;

I suggest you start with a smaller problem, such as the nearest 2 neighbors, before attempting the large problem. The MODECLUS doc has an example for nearest neighbors.

Nearest neighbor in a data set

Re: Nearest neighbor in a data set

Re: Nearest neighbor in a data set

SAS Innovate 2026 Registration is Open