<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: shortest distance in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327611#M73086</link>
    <description>&lt;P&gt;Customizing only makes sense if you have some other manner of filtering your calculations. For example, if this was spatial you might limit it to neighbouring provinces/states.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 26 Jan 2017 01:45:57 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2017-01-26T01:45:57Z</dc:date>
    <item>
      <title>shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/326922#M72895</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a dataset with around 100000 records. I want to know&amp;nbsp;only those three&amp;nbsp;records that have&amp;nbsp;shortest distance with a particular record. If I run the distance procedure then the program generates the output of matrix of 100000 variables giving distance of a variable with all other variables. It is actually not required and unecessarily it eats up my system time and sometimes evenhangs my system. If there is&amp;nbsp;any other procedure that gives me the variables with&amp;nbsp;just the shortest distance, please share.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance.....&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 03:05:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/326922#M72895</guid>
      <dc:creator>deega</dc:creator>
      <dc:date>2017-01-24T03:05:12Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/326933#M72900</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What do you mean by a particular record?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 04:17:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/326933#M72900</guid>
      <dc:creator>stat_sas</dc:creator>
      <dc:date>2017-01-24T04:17:44Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/326943#M72903</link>
      <description>&lt;P&gt;Is there any other way besides calculating all distances to identify the shortest 3 distances?&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 04:49:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/326943#M72903</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-01-24T04:49:07Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/326947#M72905</link>
      <description>&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt; This is what my question is... Few days back I used SPSS Modeler, using its KNN Node, we can quickly find the nearest record ...</description>
      <pubDate>Tue, 24 Jan 2017 05:02:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/326947#M72905</guid>
      <dc:creator>deega</dc:creator>
      <dc:date>2017-01-24T05:02:13Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/326948#M72906</link>
      <description>&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/42042"&gt;@stat_sas&lt;/a&gt;&lt;BR /&gt;by particular record, i mean for each record, something like below&lt;BR /&gt;Record No. Nearest Record 1 Nearest Record 2 Nearest Record 3&lt;BR /&gt;1 5 6 7&lt;BR /&gt;2 1 2 3&lt;BR /&gt;3 2 1 6&lt;BR /&gt;4 5 1 6&lt;BR /&gt;5 : : :&lt;BR /&gt;6 : : :&lt;BR /&gt;7 : : :&lt;BR /&gt;8 : : :&lt;BR /&gt;9 : : :&lt;BR /&gt;10 : : :&lt;BR /&gt;:&lt;BR /&gt;:&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 24 Jan 2017 05:09:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/326948#M72906</guid>
      <dc:creator>deega</dc:creator>
      <dc:date>2017-01-24T05:09:30Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/326951#M72908</link>
      <description>&lt;P&gt;Not sure but one way to get that is using k means clustering. Make a reference value and use that in proc fastclus in instat option to get the distances with regard to reference value. Then sort the output data set to flag the top 3 closest values.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 05:42:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/326951#M72908</guid>
      <dc:creator>stat_sas</dc:creator>
      <dc:date>2017-01-24T05:42:31Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327009#M72920</link>
      <description>&lt;P&gt;Deega:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;1 5 6 7&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;2 1 2 3&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;3 2 1 6&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;4 5 1 6&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;In this sample data set, you say 5, 6, 7 are the records that are closer to Record 1. You show a small example as how you got 5, 6, and 7?&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;What variable(s) (not shown here) are used for this decision?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Provide an example and illustrate how you find them. Finding algorithm to do fast is simpler once your your example is clear.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 10:43:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327009#M72920</guid>
      <dc:creator>KachiM</dc:creator>
      <dc:date>2017-01-24T10:43:21Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327044#M72925</link>
      <description>&lt;P&gt;I assume you want the Euclidean distance.&lt;/P&gt;
&lt;P&gt;If you know the "particular record", then this is an easy problem that you can complete in the DATA step.&lt;/P&gt;
&lt;P&gt;1. Put the particular record first in the data set (or hard-code it into an array).&lt;/P&gt;
&lt;P&gt;2. Use the EUCLID function to compute the Euclidean distance between the particular record and the others.&lt;/P&gt;
&lt;P&gt;3. Sort the data by distance and use the first 3 records.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, the following DATA step computes the Euclidean distances between the numerical measurements for the first observation ("Alfred") and the other observations. The result shows that Mary, William, and Janet are the students whose numerical measurements are closest to Alfred, as measured by the Euclidean distance function:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Dist(drop=i);
array refPt [3] _temporary_; /* automatically RETAINed */
array diff  [3] _temporary_;
set sashelp.class;
array Pt [*] Height Weight Age;

/* initialize refPt, or use 
   array refPt[3] (val1 val2 val3); */
if _N_ = 1 then
   do i = 1 to dim(Pt);
      refPt[i] = Pt[i];
   end;

/* compute difference between Pt and refPt */
do i = 1 to dim(pt);
   diff[i] = refPt[i] - Pt[i];
end;
dist = euclid(of diff[*]);
if _N_&amp;gt;1;
run;

proc sort data=Dist; 
by dist;
run;

proc print data=Dist(obs=3);
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;You can generalize this problem. Instead of finding the points nearest to a single reference point, you can find the points in one group that are nearest to points in another group. See the article &lt;A href="http://blogs.sas.com/content/iml/2016/09/28/distance-between-two-group.html" target="_self"&gt;"Distances between observations in two groups."&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 13:25:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327044#M72925</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2017-01-24T13:25:07Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327168#M72963</link>
      <description>&lt;P&gt;It was a logical question. You can't find the nearest neighbour without calculating all values first. Perhaps the KNN process in SPSS precalculates the distance and then obtains the nearest neighbour when you request it. This means it isn't calculating the distance matrix every time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You need to provide more details regarding what is the issue in your current process. Ideally, you can provide sample input, your code and a message with which step is inefficient. Then we'll be able to suggest alternatives for your.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PROC DISTANCE is relatively fast, but if you don't need it run each time and can cache the results somehow that's a better process.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 20:32:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327168#M72963</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-01-24T20:32:04Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327237#M72983</link>
      <description>&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt;&lt;BR /&gt;Thanks for the reply. I have not used arrays much and trying to understand your program. Could you please tell what is _N_ here ???</description>
      <pubDate>Wed, 25 Jan 2017 03:09:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327237#M72983</guid>
      <dc:creator>deega</dc:creator>
      <dc:date>2017-01-25T03:09:00Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327242#M72984</link>
      <description>&lt;P&gt;_n_ is an automatic variable that counts the number of boundary steps. It is typically used as a pseudo row counter.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#p0e0mk25gs9binn1s9jiu4otau29.htm" target="_blank"&gt;http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#p0e0mk25gs9binn1s9jiu4otau29.htm&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2017 03:36:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327242#M72984</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-01-25T03:36:20Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327251#M72988</link>
      <description>&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt; If I have around 1000000 records and for each record I need at least three nearest records,( means KNN with K=3) will this work? The Distance proc is not working because the output is 1000000 * 1000000 matrix.</description>
      <pubDate>Wed, 25 Jan 2017 05:07:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327251#M72988</guid>
      <dc:creator>deega</dc:creator>
      <dc:date>2017-01-25T05:07:52Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327331#M73007</link>
      <description>&lt;P&gt;Please clarify two issue:&lt;/P&gt;
&lt;P&gt;1. How many variables in this problem?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. Do you have ONE reference point, and you want the three closest from among&amp;nbsp;1M observations? Or do you have 1M points and for each of those you want to find the three nearest neighbors?&amp;nbsp; The first case requires computing 1M distances, The second case requires computing 1E12 distances.&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2017 10:47:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327331#M73007</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2017-01-25T10:47:48Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327604#M73084</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt;&lt;/P&gt;&lt;P&gt;The shape of my dataset is as follows:&lt;/P&gt;&lt;TABLE border="0" cellspacing="0" cellpadding="0"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;S. No.&lt;/TD&gt;&lt;TD&gt;Var1&lt;/TD&gt;&lt;TD&gt;Var2&lt;/TD&gt;&lt;TD&gt;Var3&lt;/TD&gt;&lt;TD&gt;Var4&lt;/TD&gt;&lt;TD&gt;Var5::&lt;/TD&gt;&lt;TD&gt;Var30&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A1&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A2&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A3&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A4&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A5&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A6&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A7&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A8&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A9&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A10::&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;A1000000&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;0&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want KNN (K=3) for each observation from A1 to A1000000. Actually I had 200M observations, first I made clusters using fastclus and thought of using distance procedure for calculating distance and sort and take the top 3 records. Some clusters have less records but some still have high number of records. I was able to use distance procedure till 100,000 records but it is not responding above this limit. When I tried KNN Node in SPSS it responded to higher number of records too, so I thought of customising program for distance instead of using Distance proc. &amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please advise me something for this situation.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Jan 2017 01:21:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327604#M73084</guid>
      <dc:creator>deega</dc:creator>
      <dc:date>2017-01-26T01:21:26Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327611#M73086</link>
      <description>&lt;P&gt;Customizing only makes sense if you have some other manner of filtering your calculations. For example, if this was spatial you might limit it to neighbouring provinces/states.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Jan 2017 01:45:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327611#M73086</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-01-26T01:45:57Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327634#M73100</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/85288"&gt;@deega&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are your VARS&amp;nbsp;really integers from 0 to 5&amp;nbsp;as in your sample?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If so, then instead of an algorithm to select points and determine which have the closest locations to a given point,&amp;nbsp;you could systematically steps through the nearest possible locations to&amp;nbsp; see which&amp;nbsp; are occupined by a data point.&lt;/P&gt;</description>
      <pubDate>Thu, 26 Jan 2017 03:58:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327634#M73100</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2017-01-26T03:58:04Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327637#M73102</link>
      <description>&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&lt;BR /&gt;&lt;BR /&gt;No, the variables are not integers they are real numbers like 1.23, 0.28.....etc. but actually they were integers in different ranges like some variables were binary, some were in the range of 1 to 10, some were in the range of 30 to 70 etc and for creating clusters and finding distances I standardized the data and as a result data turned into decimal points.</description>
      <pubDate>Thu, 26 Jan 2017 04:49:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327637#M73102</guid>
      <dc:creator>deega</dc:creator>
      <dc:date>2017-01-26T04:49:46Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327734#M73121</link>
      <description>&lt;P&gt;What is the business problem that you are trying to solve? &amp;nbsp;Why do you think you need to compute nearest neighbors of 1M points? &amp;nbsp;Will an approximate solution suffice?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think you are asking for a very lengthy computation. This computation&amp;nbsp;requires computing (1E6)**2 = 1E12 distances. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I suggest that you read my articles on computing NN neighbor distances. In one I show &lt;A href="http://blogs.sas.com/content/iml/2016/09/14/nearest-neighbors-sas.html" target="_self"&gt;how to compute the k nearest neighbors by using PROC MODECLUS.&lt;/A&gt;&amp;nbsp; For your data example, the syntax is&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;ods select none;
ods output neighbor=Neighbor;
proc modeclus data=Have method=1 k=4 Neighbor;
   var Var:;
   ID S_No;
run;
ods select all;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I ran some tests on your 30-dimensional data. I&amp;nbsp;estimate that you can compute the three nearest neighbors for&amp;nbsp;&lt;/P&gt;
&lt;P&gt;30,000 obs in &amp;nbsp; 1 minute&lt;/P&gt;
&lt;P&gt;50,000 obs in 4.5 minutes&lt;/P&gt;
&lt;P&gt;75,000 obs in 11.3minutes.&lt;/P&gt;
&lt;P&gt;From these kinds of experiments&amp;nbsp;and the fact that the&amp;nbsp;computation is quadratic in the number of observations, you can predict that&amp;nbsp;1M observations would require about 40 hours to run in PROC MODECLUS, assuming adequate resources such as RAM.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Personally, I would ask whether it is possible to reformulate the problem. Work smarter, not harder.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 26 Jan 2017 14:41:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/327734#M73121</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2017-01-26T14:41:55Z</dc:date>
    </item>
    <item>
      <title>Re: shortest distance</title>
      <link>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/330117#M73989</link>
      <description>&lt;P&gt;Actually as per my business problem Jaccard Distance suits the best.&lt;/P&gt;&lt;P&gt;How practical is it to use the method of bubble sort while calculating Jaccard Distance, something like below:&lt;/P&gt;&lt;P&gt;Suppose there are 1M observations and I need three NN from these 1 M observations.&amp;nbsp;If I calculate JD for obs1 &amp;amp; 2, then compare them, take the shortest and discard the other one. Then again compare the shortest of (1,2) with JD of 3, then again take the shortest.. and so on...&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 06 Feb 2017 08:52:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/shortest-distance/m-p/330117#M73989</guid>
      <dc:creator>deega</dc:creator>
      <dc:date>2017-02-06T08:52:49Z</dc:date>
    </item>
  </channel>
</rss>

