<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic K-Nearest Neighbors for Zip Code Digits in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/K-Nearest-Neighbors-for-Zip-Code-Digits/m-p/283883#M14961</link>
    <description>&lt;P&gt;I've just begun working my way through the exercises in &lt;A href="http://statweb.stanford.edu/~tibs/ElemStatLearn/" target="_self"&gt;The Elements of Statistical Learning&lt;/A&gt;.&amp;nbsp; Exercise 2.8 asks you to use k-nearest neighbors to classify scanned&amp;nbsp;zipcode digits from greyscale values (gs1-gs256).&amp;nbsp; Here's some of my code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro knn;
%do i = 1 %to 5;
	%let k = %scan(1 3 5 7 15,&amp;amp;i);
	proc discrim data=train method=npar k=&amp;amp;k out=train_k&amp;amp;k._out(keep=digit _into_)
				testdata=test testout=test_k&amp;amp;k._out(keep=digit _into_);	
				class digit;
				var gs1-gs256;
	run;
%end;
%mend;
%knn;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Using just digits 2 and 3, I get error rates on the test datasets (available at the book website) between 6% (for k1) up to 10% (for k15).&amp;nbsp; Those don't agree with a couple of solutions on the web.&amp;nbsp; &lt;A href="http://tullo.ch/articles/elements-of-statistical-learning/" target="_self"&gt;Andrew Tulloch&lt;/A&gt;&amp;nbsp;shows error rates between 2% and 4%, while &lt;A href="http://waxworksmath.com/Authors/G_M/Hastie/WriteUp/weatherwax_epstein_hastie_solutions_manual.pdf" target="_self"&gt;Weatherwax and Epstein&lt;/A&gt;&amp;nbsp;have error rates between 9% and 11%.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there anyone else who has done the exercise and can confirm which of the three&amp;nbsp;answers (if any)&amp;nbsp;is correct?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Martin&lt;/P&gt;</description>
    <pubDate>Wed, 13 Jul 2016 15:12:05 GMT</pubDate>
    <dc:creator>mcs</dc:creator>
    <dc:date>2016-07-13T15:12:05Z</dc:date>
    <item>
      <title>K-Nearest Neighbors for Zip Code Digits</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/K-Nearest-Neighbors-for-Zip-Code-Digits/m-p/283883#M14961</link>
      <description>&lt;P&gt;I've just begun working my way through the exercises in &lt;A href="http://statweb.stanford.edu/~tibs/ElemStatLearn/" target="_self"&gt;The Elements of Statistical Learning&lt;/A&gt;.&amp;nbsp; Exercise 2.8 asks you to use k-nearest neighbors to classify scanned&amp;nbsp;zipcode digits from greyscale values (gs1-gs256).&amp;nbsp; Here's some of my code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro knn;
%do i = 1 %to 5;
	%let k = %scan(1 3 5 7 15,&amp;amp;i);
	proc discrim data=train method=npar k=&amp;amp;k out=train_k&amp;amp;k._out(keep=digit _into_)
				testdata=test testout=test_k&amp;amp;k._out(keep=digit _into_);	
				class digit;
				var gs1-gs256;
	run;
%end;
%mend;
%knn;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Using just digits 2 and 3, I get error rates on the test datasets (available at the book website) between 6% (for k1) up to 10% (for k15).&amp;nbsp; Those don't agree with a couple of solutions on the web.&amp;nbsp; &lt;A href="http://tullo.ch/articles/elements-of-statistical-learning/" target="_self"&gt;Andrew Tulloch&lt;/A&gt;&amp;nbsp;shows error rates between 2% and 4%, while &lt;A href="http://waxworksmath.com/Authors/G_M/Hastie/WriteUp/weatherwax_epstein_hastie_solutions_manual.pdf" target="_self"&gt;Weatherwax and Epstein&lt;/A&gt;&amp;nbsp;have error rates between 9% and 11%.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there anyone else who has done the exercise and can confirm which of the three&amp;nbsp;answers (if any)&amp;nbsp;is correct?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Martin&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jul 2016 15:12:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/K-Nearest-Neighbors-for-Zip-Code-Digits/m-p/283883#M14961</guid>
      <dc:creator>mcs</dc:creator>
      <dc:date>2016-07-13T15:12:05Z</dc:date>
    </item>
    <item>
      <title>Re: K-Nearest Neighbors for Zip Code Digits</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/K-Nearest-Neighbors-for-Zip-Code-Digits/m-p/283929#M14965</link>
      <description>&lt;P&gt;Is there a reason you used proc discrim instead of proc cluster? or fastclus?&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jul 2016 00:53:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/K-Nearest-Neighbors-for-Zip-Code-Digits/m-p/283929#M14965</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-07-13T00:53:36Z</dc:date>
    </item>
    <item>
      <title>Re: K-Nearest Neighbors for Zip Code Digits</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/K-Nearest-Neighbors-for-Zip-Code-Digits/m-p/284075#M14990</link>
      <description>&lt;P&gt;I haven't used either of those before, and after a quick look at the documentation, I couldn't figure out how to make them do what I want.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can you explain how clustering lets me classify digits?&amp;nbsp; I assume I would cluster the training dataset and then somehow use the output to score the test dataset, but I don't understand the details.&amp;nbsp; Specifically, how would I use the known value of the digit in the training dataset?&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jul 2016 15:10:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/K-Nearest-Neighbors-for-Zip-Code-Digits/m-p/284075#M14990</guid>
      <dc:creator>mcs</dc:creator>
      <dc:date>2016-07-13T15:10:11Z</dc:date>
    </item>
  </channel>
</rss>

