BookmarkSubscribeRSS Feed
mcs
Obsidian | Level 7 mcs
Obsidian | Level 7

I've just begun working my way through the exercises in The Elements of Statistical Learning.  Exercise 2.8 asks you to use k-nearest neighbors to classify scanned zipcode digits from greyscale values (gs1-gs256).  Here's some of my code.

 

%macro knn;
%do i = 1 %to 5;
	%let k = %scan(1 3 5 7 15,&i);
	proc discrim data=train method=npar k=&k out=train_k&k._out(keep=digit _into_)
				testdata=test testout=test_k&k._out(keep=digit _into_);	
				class digit;
				var gs1-gs256;
	run;
%end;
%mend;
%knn;

Using just digits 2 and 3, I get error rates on the test datasets (available at the book website) between 6% (for k1) up to 10% (for k15).  Those don't agree with a couple of solutions on the web.  Andrew Tulloch shows error rates between 2% and 4%, while Weatherwax and Epstein have error rates between 9% and 11%.

 

 

Is there anyone else who has done the exercise and can confirm which of the three answers (if any) is correct?

 

Martin

2 REPLIES 2
Reeza
Super User

Is there a reason you used proc discrim instead of proc cluster? or fastclus?

mcs
Obsidian | Level 7 mcs
Obsidian | Level 7

I haven't used either of those before, and after a quick look at the documentation, I couldn't figure out how to make them do what I want.

 

Can you explain how clustering lets me classify digits?  I assume I would cluster the training dataset and then somehow use the output to score the test dataset, but I don't understand the details.  Specifically, how would I use the known value of the digit in the training dataset?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1926 views
  • 1 like
  • 2 in conversation