BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
umeshgiri48
Obsidian | Level 7

Hi,

 

I would like to implement a nearest neighbors algorithm. More specifically, I have more than 300000 customers and I need to find nearest 100 customers 50 above them and 50 below them on the basis of their Latitude value (variable name) which i have sorted in ascending, suppose customer who is in 1st row then he has 0 customers above him so his closest 100 customers will be 100 below them i.e 1-100 and like wise customer who is on 51st row then his above 50 will be 1-50 customers and below will be 51-100, like wise it will process for all the 300000 customers and a new data set will be created by appending all the data set in one which will be of 300000*100.

 

i am importing the file and then sorting the data on the basis of Latitude value and then assigning the row number after that i am helpless.

 

kind regards

 

proc sort data = sample; by Latitude; run;
data sample;                                                                                                                                           
  set sample ;                                                                                                                                         
  row_number=_n_;                                                                                                                                          
run; 
1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Use PROC MODECLUS. The NEIGHBOR option on the PROC MODECLUS statement produces a table that gives the observation number (or ID value) of nearest neighbors. For example, the following statements produce the observation numbers for the nearest neighbors:

 

/* Use K=p option to find nearest p-1 neighbors */
proc modeclus data=Sample method=1 k=101 Neighbor; /* nearest 100 nbrs */
var x y z w;
run;

 

I suggest you start with a smaller problem, such as the nearest 2 neighbors, before attempting the large problem. The MODECLUS doc has an example for nearest neighbors.

View solution in original post

1 REPLY 1
Rick_SAS
SAS Super FREQ

Use PROC MODECLUS. The NEIGHBOR option on the PROC MODECLUS statement produces a table that gives the observation number (or ID value) of nearest neighbors. For example, the following statements produce the observation numbers for the nearest neighbors:

 

/* Use K=p option to find nearest p-1 neighbors */
proc modeclus data=Sample method=1 k=101 Neighbor; /* nearest 100 nbrs */
var x y z w;
run;

 

I suggest you start with a smaller problem, such as the nearest 2 neighbors, before attempting the large problem. The MODECLUS doc has an example for nearest neighbors.