<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Nearest neighbour between two datasets in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124494#M6516</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;This sounds like a Case/Control problem, where you match patients based on several variables.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If so you can search on those terms to help find relevant solutions. Probabilistic match is a good one to look into as well.&lt;/P&gt;&lt;P&gt;I'm a fan of the Mayo Clinic Macros but if you want a solution that is customized that's also doable. There's another thread on here today where &lt;A __default_attr="5253" __jive_macro_name="user" class="jive_macro jive_macro_user" data-objecttype="3" href="https://communities.sas.com/"&gt;&lt;/A&gt; lists an algorithm that may be close to what you want.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If it's not a case/control problem, ignore the above &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 28 Aug 2013 22:01:33 GMT</pubDate>
    <dc:creator>Reeza</dc:creator>
    <dc:date>2013-08-28T22:01:33Z</dc:date>
    <item>
      <title>Nearest neighbour between two datasets</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124493#M6515</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I would like to implement a nearest neighbours algorithm. More specifically, I have say 20000 customers in a dataset A that I would like to match against another a dataset B containing 2000 customers and for each customer in A find the top 50 nearest neighbours (most alike customers) in B. All of the relevant variables would be more or less standardised. For instance, income and age would both be numerical values with roughly same mean and variance. Sadly, I haven't been able to find any procedures in SAS that can match one dataset against another in this fashion.. If anyone can point me to the right one for this I would be euphoric!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Seeing as I can't find a fitting procedure I'm thinking of how to do this using data steps:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1. Merge the datasets A and B together to create a similarity matrix named C. For each customer in dataset A a variable would be created for each customer in dataset B showing the similarity using Euclidean distance. Furthermore, a linked variable would be created to hold the unique id of the customer in question.&lt;/P&gt;&lt;P&gt;2. For each customer in C select the 50 variables that have the lowest value (=highest similarity) and keep the unique customer ID's connected to these values. The result of this would be that I for each customer in A will have 50 unique customer ID's that will help me find the 50 records in B that match the customer the best.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is this the right way to do it?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Kind regards,&lt;/P&gt;&lt;P&gt;Bjarke&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 28 Aug 2013 21:55:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124493#M6515</guid>
      <dc:creator>Bjarke</dc:creator>
      <dc:date>2013-08-28T21:55:46Z</dc:date>
    </item>
    <item>
      <title>Re: Nearest neighbour between two datasets</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124494#M6516</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;This sounds like a Case/Control problem, where you match patients based on several variables.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If so you can search on those terms to help find relevant solutions. Probabilistic match is a good one to look into as well.&lt;/P&gt;&lt;P&gt;I'm a fan of the Mayo Clinic Macros but if you want a solution that is customized that's also doable. There's another thread on here today where &lt;A __default_attr="5253" __jive_macro_name="user" class="jive_macro jive_macro_user" data-objecttype="3" href="https://communities.sas.com/"&gt;&lt;/A&gt; lists an algorithm that may be close to what you want.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If it's not a case/control problem, ignore the above &lt;img id="smileyhappy" class="emoticon emoticon-smileyhappy" src="https://communities.sas.com/i/smilies/16x16_smiley-happy.png" alt="Smiley Happy" title="Smiley Happy" /&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 28 Aug 2013 22:01:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124494#M6516</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2013-08-28T22:01:33Z</dc:date>
    </item>
    <item>
      <title>Re: Nearest neighbour between two datasets</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124495#M6517</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I think you had a typo in 2 and that you meant to say "select the 50 records ..".&amp;nbsp; How many variables do you have that are common and standardized in the two datasets?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 28 Aug 2013 22:04:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124495#M6517</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2013-08-28T22:04:33Z</dc:date>
    </item>
    <item>
      <title>Re: Nearest neighbour between two datasets</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124496#M6518</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thanks for the quick responses!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Reeza:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I must say that I can't find the similarity with Case/Control. Perhaps I should expand a bit more on the purpose of this data mining.. The 2000 customers in B could be customers, who has just left the insurance company that I work at. I would then like to see if I can to some extent predict whether similar customers will leave as well. Similarly, it could be a very valuable tool for selecting which customers to select for cross-selling purposes. In this case the 2000 customers would already have been targeted and the rest of the portfolio would be matched to select similar customers to those that had a good hit-rate when attempting to sell additional products.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Arthur:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Well, my point was actually to have a variable on the dataset C for each customer in B. By selecting the 50 variables with the lowest distance I would be able to identify the top 50 records. I've edited the original text to clarify this. Another approach would be to create an observation for each combination of A and B resulting in roughly 40 million observations and then selecting from these. My hope is that the variable-based approach will be faster. Moreover, I might want to expand the datasets with even more customers if possible.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;As an start I believe perhaps 4 variables will be in common and used for the similarity rating. However, it would be very nice if the method would be able to handle up to 10 variables.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 28 Aug 2013 22:26:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124496#M6518</guid>
      <dc:creator>Bjarke</dc:creator>
      <dc:date>2013-08-28T22:26:01Z</dc:date>
    </item>
    <item>
      <title>Re: Nearest neighbour between two datasets</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124497#M6519</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Sounds more like predicting churn than it does trying to find nearest neighbor.&amp;nbsp; Take a look at: &lt;A href="http://www2.sas.com/proceedings/sugi27/p114-27.pdf" title="http://www2.sas.com/proceedings/sugi27/p114-27.pdf"&gt;http://www2.sas.com/proceedings/sugi27/p114-27.pdf&lt;/A&gt;&lt;/P&gt;&lt;P&gt;and do a google search on: predicting churn sas&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 28 Aug 2013 23:55:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124497#M6519</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2013-08-28T23:55:15Z</dc:date>
    </item>
    <item>
      <title>Re: Nearest neighbour between two datasets</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124498#M6520</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;SAS has procedure &lt;STRONG&gt;modeclus&lt;/STRONG&gt; that can find nearest neighbors quite efficiently. Look at the following example&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;/* Create some example datasets */&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;data A B;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;array v var1-var5;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;call streaminit(56645);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;do custId=1 to 200;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do _n_ = 1 to 5;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; v{_n_} = rand("NORMAL");&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output A;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;do custId=1 to 2000;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do _n_ = 1 to 5;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; v{_n_} = rand("NORMAL");&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output B;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt; &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;/* Concatenate the datasets. Change Ids in observations from dataset A */&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;data C(keep=custId var1-var5);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;set A (in=inA) B;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;if inA then custId = -custId;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt; &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;/* Find the nearest neighbors (ignore the warning) */&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;ods _all_ close;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;proc modeclus data=C dk=51 neighbor;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;var var1-var5;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;id custId;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;ods output Neighbor=D;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;ods listing;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt; &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;/* Reformat the output; keep only Ids originaly from dataset A */&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;data E(keep=custId neighborPos neighborId);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;retain custId;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;set D;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;if not missing(ID) then do;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; custId = -input(ID,best.);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; neighborPos = 0;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;neighborPos + 1;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;if custId &amp;lt; 0 then stop;&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;neighborId = input(Nbor,best.);&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;run;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PG&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 29 Aug 2013 02:19:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124498#M6520</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2013-08-29T02:19:25Z</dc:date>
    </item>
    <item>
      <title>Re: Nearest neighbour between two datasets</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124499#M6521</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I still think Case/Control Problem. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;See the DIST Macro from Mayo Clinic:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A class="active_link" href="http://www.mayo.edu/research/departments-divisions/department-health-sciences-research/division-biomedical-statistics-informatics/software/locally-written-sas-macros" title="http://www.mayo.edu/research/departments-divisions/department-health-sciences-research/division-biomedical-statistics-informatics/software/locally-written-sas-macros"&gt;http://www.mayo.edu/research/departments-divisions/department-health-sciences-research/division-biomedical-statistics-informatics/software/locally-written-sas-macros&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 29 Aug 2013 04:03:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124499#M6521</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2013-08-29T04:03:22Z</dc:date>
    </item>
    <item>
      <title>Re: Nearest neighbour between two datasets</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124500#M6522</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I am sorry that I have not responded, I've been swamped at work so I haven't been able to test anything untill now.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;@Reeza)&lt;BR /&gt;You were right, it is a Case/Control problem! I've had to modify the %dist-macro slightly, but it is just great for my needs! Thank you!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;@PGStats)&lt;BR /&gt;I'm sorry, but I couldn't quite make sense of your code.. However, I used some of yours thoughts when implementing my example.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;@Arthur)&lt;BR /&gt;We are predicting churn in general using other methods, which are more suited for the purpose. The main point of the program is to find customers to approach for a sale attempt. Nevertheless, I find it interested if it could be possible to use this classification of nearest neighbours in other areas as well.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="j-post-author "&gt;&lt;/SPAN&gt;I've added the code for a simple example using my approach below:&lt;/P&gt;&lt;PRE __jive_macro_name="quote" class="jive_text_macro jive_macro_quote" modifiedtitle="true"&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;filename macro '\\P-114-230-013\Arbejdsmapper\BJF\Mersalg\Macro';&lt;/P&gt;
&lt;P&gt;%inc macro(nobs);&lt;/P&gt;
&lt;P&gt;%inc macro(distMacro_modified);&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;/* Creating example dataset */&lt;/P&gt;
&lt;P&gt;data all;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; infile datalines dsd;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; input Group: 1.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* 0 = customers with sales attempt. 1 = customer without sales attempt.*/&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ID:&amp;nbsp;&amp;nbsp;&amp;nbsp; 2.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Match1:2.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Match2:2.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SuccesfulSale:1.; /* Boolean to indicate whether or not a sales attempt was succesful. */&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; datalines;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0, 1, 7,&amp;nbsp; 4,&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0, 2, 9,&amp;nbsp; 6,&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0, 3, 22, 8,&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0, 4, 27, 10,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1, 5, 5,&amp;nbsp; 5,&amp;nbsp;&amp;nbsp;&amp;nbsp; .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1, 6, 22, 7,&amp;nbsp;&amp;nbsp;&amp;nbsp; .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1, 7, 17, 8,&amp;nbsp;&amp;nbsp;&amp;nbsp; .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1, 8, 5,&amp;nbsp; 8,&amp;nbsp;&amp;nbsp;&amp;nbsp; .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1, 9, 8,&amp;nbsp; 9,&amp;nbsp;&amp;nbsp;&amp;nbsp; .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1, 10, 10, 11,&amp;nbsp;&amp;nbsp;&amp;nbsp; .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1, 11, 14, 12,&amp;nbsp;&amp;nbsp;&amp;nbsp; .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1, 12, 18, 13,&amp;nbsp;&amp;nbsp;&amp;nbsp; .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1, 13, 21, 14,&amp;nbsp;&amp;nbsp;&amp;nbsp; .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1, 14, 23, 15,&amp;nbsp;&amp;nbsp;&amp;nbsp; .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0, 15, 2,&amp;nbsp; 22,&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0, 16, 9,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 5,&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0, 17, 17, 13,&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0, 18, 29,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2,&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0, 19, 14, 14,&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; 0, 20,&amp;nbsp; 4, 17,&amp;nbsp;&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;
&lt;P&gt;proc sort; by group id; run;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;/* Use slightly altered version of the distance macro */&lt;/P&gt;
&lt;P&gt;%dist(data=all,group=group,id=id,mvars=Match1 Match2,wts=1 2.5,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out=distanceMatrix,transf=1,dist=2);&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;/* Outputs a matrix with the combination of each customer without sales and customer with sales including the distance between them. */&lt;/P&gt;
&lt;P&gt;data distanceMatrixSelectNearest;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; set distanceMatrix;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; array idSales&amp;nbsp;&amp;nbsp;&amp;nbsp; _C_ID1-_C_ID10;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; array dist&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; _C1-_C10;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do _i_ = 1 to 10;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; distanceNearest = dist{_i_};&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; idNearest&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; = idSales{_i_};&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; keep id idNearest distanceNearest;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;
&lt;P&gt;proc sort; by id distanceNearest; run;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;/* Selects the 3 nearest neighbours */&lt;/P&gt;
&lt;P&gt;data distanceMatrixSelectNearest2;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; retain nearestNeighbourCount;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; set distanceMatrixSelectNearest;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; by id distanceNearest;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if first.id then nearestNeighbourCount = 1; else nearestNeighbourCount = nearestNeighbourCount + 1;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if nearestNeighbourCount &amp;lt;= 3;&lt;/P&gt;
&lt;P&gt;proc sort; by idNearest; run;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;/* Merges sales information onto the NN matrix */&lt;/P&gt;
&lt;P&gt;proc sort data=all out=salesAttempt (drop=group match1 match2); where group=0; by id; run;&lt;/P&gt;
&lt;P&gt;data distanceMatrixSelectNearest3;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; merge distanceMatrixSelectNearest2 (in=a)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; salesAttempt (rename=id=idNearest);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; by idNearest;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if a;&lt;/P&gt;
&lt;P&gt;proc sort; by id nearestNeighbourCount; run;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;/* Create a single observation for each customer without a sale attempt containing a sales prediction as well as a id and distance to NN */&lt;/P&gt;
&lt;P&gt;data customersToCall;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; retain nearestNeighbour1-nearestNeighbour3;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; retain nearestNeighbourSales1-nearestNeighbourSales3;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; retain nearestNeighbourDist1-nearestNeighbourDist3;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; array nn_id nearestNeighbour1-nearestNeighbour3;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; array nn_sale nearestNeighbourSales1-nearestNeighbourSales3;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; array nn_dist nearestNeighbourDist1-nearestNeighbourDist3;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; set distanceMatrixSelectNearest3;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; by id;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; nn_id{nearestNeighbourCount} = idNearest;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; nn_dist{nearestNeighbourCount} = distanceNearest;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; nn_sale{nearestNeighbourCount} = succesfulSale;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; drop idNearest distanceNearest succesfulSale nearestNeighbourCount;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if last.id;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if last.id then do;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; totalDist = 0;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; salesPrediction = 0;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* Find the total distance */&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do _i_ = 1 to 3;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; totalDist = totalDist + nn_dist{_i_};&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* Add every sales attempt and take the average*/&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do _i_ = 1 to 3;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; salesPrediction = salesPrediction + nn_sale{_i_};&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; salesPrediction = salesPrediction / 3;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;
&lt;P&gt;proc sort; by descending salesPrediction; run;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;data customersToCall2;&lt;/P&gt;
&lt;P&gt;set customersToCall;&lt;/P&gt;
&lt;P&gt;by descending salesPrediction;&lt;/P&gt;
&lt;P&gt;if _n_ &amp;lt;= 3;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;

&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Feel free to correct me on my code and my thoughts of using this approach.&lt;/P&gt;&lt;P&gt;Once again, thank you!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Kind regards,&lt;BR /&gt;Bjarke&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 20 Sep 2013 15:47:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124500#M6522</guid>
      <dc:creator>Bjarke</dc:creator>
      <dc:date>2013-09-20T15:47:50Z</dc:date>
    </item>
    <item>
      <title>Re: Nearest neighbour between two datasets</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124501#M6523</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;You might also use PROC FASTCLUS:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; 1. Use the SEED= option of the PROC FASTCLUS statement to include a data set &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; of observations &lt;/SPAN&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;around which you want &lt;/SPAN&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;other "new" observations to cluster;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; 2. Use the DATA= option of the PROC FASTCLUS statement to include a data set &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; of the "new" &lt;/SPAN&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;observiations to be clustered; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp; 3. Set the MAXITER option of the PROC FASTCLUS statement to zero (MAXITER=0) &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; to prevent &lt;/SPAN&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;the procedure from changing &lt;/SPAN&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;the original central "seed" &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; observations (see #1 above).&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;Since PROC FASTCLUS is designed for interval/ratio variables, you can only incorporate &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;nominal &lt;/SPAN&gt;&lt;SPAN style="font-family: courier new,courier;"&gt; variables by creating separate clusters of observations within each category&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt; of the nominal &lt;/SPAN&gt;&lt;SPAN style="font-family: courier new,courier;"&gt; variables.&amp;nbsp; To do this, sort by the nominal variables beforehand, and &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;use the BY statement &lt;/SPAN&gt;&lt;SPAN style="font-family: courier new,courier;"&gt;in PROC FASTCLUS.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 24 Sep 2013 12:23:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Nearest-neighbour-between-two-datasets/m-p/124501#M6523</guid>
      <dc:creator>1zmm</dc:creator>
      <dc:date>2013-09-24T12:23:15Z</dc:date>
    </item>
  </channel>
</rss>

