BookmarkSubscribeRSS Feed
mcs
Obsidian | Level 7 mcs
Obsidian | Level 7

I've just begun working my way through the exercises in The Elements of Statistical Learning.  Exercise 2.8 asks you to use k-nearest neighbors to classify scanned zipcode digits from greyscale values (gs1-gs256).  Here's some of my code.

 

%macro knn;
%do i = 1 %to 5;
	%let k = %scan(1 3 5 7 15,&i);
	proc discrim data=train method=npar k=&k out=train_k&k._out(keep=digit _into_)
				testdata=test testout=test_k&k._out(keep=digit _into_);	
				class digit;
				var gs1-gs256;
	run;
%end;
%mend;
%knn;

Using just digits 2 and 3, I get error rates on the test datasets (available at the book website) between 6% (for k1) up to 10% (for k15).  Those don't agree with a couple of solutions on the web.  Andrew Tulloch shows error rates between 2% and 4%, while Weatherwax and Epstein have error rates between 9% and 11%.

 

 

Is there anyone else who has done the exercise and can confirm which of the three answers (if any) is correct?

 

Martin

2 REPLIES 2
Reeza
Super User

Is there a reason you used proc discrim instead of proc cluster? or fastclus?

mcs
Obsidian | Level 7 mcs
Obsidian | Level 7

I haven't used either of those before, and after a quick look at the documentation, I couldn't figure out how to make them do what I want.

 

Can you explain how clustering lets me classify digits?  I assume I would cluster the training dataset and then somehow use the output to score the test dataset, but I don't understand the details.  Specifically, how would I use the known value of the digit in the training dataset?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1951 views
  • 1 like
  • 2 in conversation