<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Euclidean Distance Without SAS/STATS in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Euclidean-Distance-Without-SAS-STATS/m-p/516941#M26361</link>
    <description>&lt;P&gt;Even with proc distance you would want to work with the compounds as columns and batches as rows. Using a datastep, read the first batch on first iteration into a retained array and then read other batches into another array. Substract the _0 batch compound values and compute the euclidian norm with function EUCLID().&lt;/P&gt;</description>
    <pubDate>Thu, 29 Nov 2018 04:23:34 GMT</pubDate>
    <dc:creator>PGStats</dc:creator>
    <dc:date>2018-11-29T04:23:34Z</dc:date>
    <item>
      <title>Euclidean Distance Without SAS/STATS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Euclidean-Distance-Without-SAS-STATS/m-p/516904#M26355</link>
      <description>&lt;P&gt;Current version: 9.04.01M5P091317&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for viewing&amp;nbsp;this post.&lt;/P&gt;&lt;P&gt;I do not have access to SAS/STATS and thus, cannot use "proc distance", but need to calculate the Euclidean distance between multi-dimensional points.&amp;nbsp; A point is represented by each column of data.&lt;/P&gt;&lt;P&gt;The first column of the dataset is a list of compounds for a batch number (batchno) equal to 0 with name&amp;nbsp;_0.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="1.JPG" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/25240i7707F00739DD7F52/image-size/large?v=v2&amp;amp;px=999" role="button" title="1.JPG" alt="1.JPG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;The remaining 22583 columns (not shown)&amp;nbsp;are other batches labeled _375, _376, _377,...,_22583 and&amp;nbsp;stem from a transpose of the original dataset.&lt;/P&gt;&lt;P&gt;There are 11 observations per batch (column), all type&amp;nbsp;numeric 8, and there are no missing values.&lt;/P&gt;&lt;P&gt;The goal is to calculate the Euclidean distance between batch _0 and all the other batches to determine where batch _0 came from based on the minimum distance.&lt;/P&gt;&lt;P&gt;The square of the difference is calculated for some batches in the source code shown&amp;nbsp;below.&amp;nbsp; Each line creates a new column&amp;nbsp;in the table&amp;nbsp;view.&amp;nbsp; Then the sum can be calculated&amp;nbsp;using "proc means".&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;I acknowledged that the&amp;nbsp;square root and minimum distance are also necessary to complete the process.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data euclid;
	set fewerbatches;
		b375=((_0-_375))**2;
		b376=((_0-_376))**2;
		b377=((_0-_375))**2;
		b378=((_0-_376))**2;
		b379=((_0-_375))**2;
		b380=((_0-_376))**2;
		b381=((_0-_375))**2;
		b382=((_0-_376))**2;
		b383=((_0-_375))**2;
		b384=((_0-_376))**2;
		b385=((_0-_375))**2;
		b386=((_0-_376))**2;
		b387=((_0-_375))**2;
		b388=((_0-_376))**2;
		b389=((_0-_375))**2;
		b390=((_0-_376))**2;
	run;

proc means data=euclid sum;
	var b375;
	run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It is clear to me that an array or macro would reduce the potential size of the code.&amp;nbsp; For example, I reduced the size and created an array:&lt;/P&gt;&lt;P&gt;array btchs(142) b375-b515;&lt;/P&gt;&lt;P&gt;This created several more columns of missing data.&amp;nbsp; Attempts to fill the columns with difference of squares was partially successful, but equally as redundant as what I have shown above.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Again, thank you in advance for the assistance.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Jane&lt;/P&gt;</description>
      <pubDate>Wed, 28 Nov 2018 23:12:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Euclidean-Distance-Without-SAS-STATS/m-p/516904#M26355</guid>
      <dc:creator>jawhitmire</dc:creator>
      <dc:date>2018-11-28T23:12:06Z</dc:date>
    </item>
    <item>
      <title>Re: Euclidean Distance Without SAS/STATS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Euclidean-Distance-Without-SAS-STATS/m-p/516920#M26357</link>
      <description>So your picture is what you have and what do you want as output. Please post data as text, include what you have and show your desired output. I would also suggest not a macro but either SQL or a data step, but I think a data step is enough here. Euclidean distance is a straightforward calculation, so it’s just finding the fastest option. I suspect a data transformation would be needed as well.</description>
      <pubDate>Thu, 29 Nov 2018 00:15:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Euclidean-Distance-Without-SAS-STATS/m-p/516920#M26357</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-11-29T00:15:05Z</dc:date>
    </item>
    <item>
      <title>Re: Euclidean Distance Without SAS/STATS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Euclidean-Distance-Without-SAS-STATS/m-p/516922#M26358</link>
      <description>&lt;P&gt;I think that you want something based on this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data want;
 set have;
 array a {375:22583} _375-_22583;
 array b {375:22583} b375 - b22583;
 do i= 375 to 22583;
   b[i] = ( _0 - a[i])**2;
 end;
run;&lt;/PRE&gt;
&lt;P&gt;This form of defining the definition allows use of 375 to reference the first item in the array, 376 the second and so on.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would try this with MUCH reduced set variables and indices as there is the potential to create some really outrageous sets.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I suspect that the transpose wasn't needed if there is some identification information you didn't share as by group processing getting the initial value as the first of a by group and retaining it for use with the later batches may have been an alternate approach:&lt;/P&gt;
&lt;PRE&gt;data have;
   input group value;
datalines;
1  23
1  345
1  567
2  1.3
2  1.9
2  .4
3  555
3  666
3  777
;
run;

data want;
   set have;
   by group;
   retain firstval;
   if first.group then firstval=value;
   else b = (firstval - value)**2;
run;&lt;/PRE&gt;</description>
      <pubDate>Thu, 29 Nov 2018 00:24:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Euclidean-Distance-Without-SAS-STATS/m-p/516922#M26358</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2018-11-29T00:24:37Z</dc:date>
    </item>
    <item>
      <title>Re: Euclidean Distance Without SAS/STATS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Euclidean-Distance-Without-SAS-STATS/m-p/516941#M26361</link>
      <description>&lt;P&gt;Even with proc distance you would want to work with the compounds as columns and batches as rows. Using a datastep, read the first batch on first iteration into a retained array and then read other batches into another array. Substract the _0 batch compound values and compute the euclidian norm with function EUCLID().&lt;/P&gt;</description>
      <pubDate>Thu, 29 Nov 2018 04:23:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Euclidean-Distance-Without-SAS-STATS/m-p/516941#M26361</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2018-11-29T04:23:34Z</dc:date>
    </item>
    <item>
      <title>Re: Euclidean Distance Without SAS/STATS</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Euclidean-Distance-Without-SAS-STATS/m-p/517101#M26365</link>
      <description>&lt;P&gt;Thank you for the quick reply.&amp;nbsp; I reduced the size of the file and the following code worked to square each variable.&amp;nbsp; Cheers!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;&lt;STRONG&gt;data&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; euclid;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;set&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; fewerbatches;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;array&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; a {&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="2"&gt;375&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;:&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="2"&gt;400&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;} _375-_400;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;array&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; b {&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="2"&gt;375&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;:&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="2"&gt;400&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;} b375-b400;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;do&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; i=&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="2"&gt;375&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="Courier New" size="2"&gt;to&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="2"&gt;400&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2"&gt;b[i]=(_0-a[i])**&lt;/FONT&gt;&lt;STRONG&gt;&lt;FONT color="#008080" face="Courier New" size="2"&gt;2&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;end&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="2"&gt;drop&lt;/FONT&gt;&lt;FONT face="Courier New" size="2"&gt; i;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="2"&gt;run&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="2"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2"&gt;The original file had batches by rows with no grouping variable as each batch is unique.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2"&gt;Now that I have the difference of squares, I need to&amp;nbsp;find&amp;nbsp;the sum of each batch column, b375-b400,&amp;nbsp;take the square root of the sum, then perhaps use proc means to find the minimum&amp;nbsp;Euclidean distance.&amp;nbsp;&amp;nbsp;Based on&amp;nbsp;feedback received from 3 SAS community experts,&amp;nbsp;it seems the next step is easier&amp;nbsp;when I&amp;nbsp;transpose dataset "Euclid" back to rows?&amp;nbsp; &lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="2"&gt;Thank you to all for the quick reply and assistance.&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Nov 2018 15:55:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Euclidean-Distance-Without-SAS-STATS/m-p/517101#M26365</guid>
      <dc:creator>jawhitmire</dc:creator>
      <dc:date>2018-11-29T15:55:49Z</dc:date>
    </item>
  </channel>
</rss>

