Here's the situation: The adventures of Proc Studley -- a popular Sci-Fi/Fantasy book series -- is under threat by a string of counterfeit/fake versions that have hit the marketplace. Your task is to find the fakes. All of the authentic books have valid ISBN 10-digit numbers, which is how world libraries track books. The counterfeit books have invalid ISBN values.
(This challenge first premiered in SAS Analytics Explorers, a special group for customers who want to do more with their SAS learning and earn rewards in the process!)
The algorithm for validating ISBN-10 values uses a checksum approach. Here are the steps:
Here are the book titles and their purported ISBN values. Write a the shortest possible SAS program that reads the book list, validates each ISBN value, and create a report of the real and fake books. Include your code and the output in your response!
(Note: obviously all of these books are made up, but the ISBN number scheme and algorithm is a real thing! You can check your work ad hoc with the ISBN Checker.)
Here's the data, all ready to run in SAS.
data isbn;
infile datalines dsd;
length title $ 70 isbn $ 10;
input title isbn;
datalines;
Proc Studley and the Starship of Destiny,0434488665
The Chronicles of Proc Studley: The Lost Realm,2018166516
Proc Studley and the Quantum Key,9405643837
The Legend of Proc Studley: The Celestial Quest,6032522768
Proc Studley and the Enchanted Nebula,4394205952
The Adventures of Proc Studley: The Galactic Rift,2353276079
Proc Studley and the Time Crystal,6493135591
The Saga of Proc Studley: The Forbidden Planet,6776994355
Proc Studley and the Alien Alliance,2227835451
The Epic of Proc Studley: The Cosmic War,8018735913
Proc Studley and the Dragon of Andromeda,0841779538
The Odyssey of Proc Studley: The Stellar Siege,8730652341
Proc Studley and the Phoenix Star,1594122350
The Journey of Proc Studley: The Nebula Nexus,224320418X
Proc Studley and the Shadow Realm,6857923406
The Quest of Proc Studley: The Celestial Citadel,3967006111
Proc Studley and the Interstellar Insurrection,9537581977
The Trials of Proc Studley: The Quantum Paradox,1283514257
Proc Studley and the Martian Rebellion,566485052X
The Legacy of Proc Studley: The Galactic Guardians,6994588902
Proc Studley and the Eternal Eclipse,9236137644
The Chronicles of Proc Studley: The Andromeda Enigma,7649918275
Proc Studley and the Infinite Horizon,458574645X
The Adventures of Proc Studley: The Alien Dawn,7601111520
Proc Studley and the Black Hole Conspiracy,1911465988
The Legend of Proc Studley: The Cosmic Code,266671036X
Proc Studley and the Celestial Shadows,4287030303
The Saga of Proc Studley: The Stellar Saga,2561407012
Proc Studley and the Quantum Quest,6933010252
The Epic of Proc Studley: The Celestial Conflict,4278675463
Proc Studley and the Galactic Gambit,351049735X
The Odyssey of Proc Studley: The Nebula Knights,0566910450
Proc Studley and the Cosmic Crusade,9802436077
The Journey of Proc Studley: The Andromeda Ascension,195432331X
Proc Studley and the Stellar Struggle,8421179217
The Quest of Proc Studley: The Alien Artifact,0979272564
Proc Studley and the Celestial Saga,3584795834
The Trials of Proc Studley: The Galactic Genesis,0713565068
Proc Studley and the Quantum Conundrum,9074601407
The Legacy of Proc Studley: The Cosmic Odyssey,0168786583
;
run;
data validation;
set isbn;
array _nine [9] (10, 9, 8, 7, 6, 5, 4, 3, 2);
array nine [9];
do i=1 to 9;
nine{i}= input(substr(isbn,i, 1), best.)*_nine{i};
end;
_sum=sum(of nine:);
_remainder= mod(_sum, 11);
result= 11-_remainder;
if result eq 10 then isbn_validated= substr(isbn, 1, 9)||'X';
else if result eq 11 then isbn_validated= substr(isbn, 1, 9)||'0';
if isbn=isbn_validated;
keep title isbn:;
run;
proc print; run;
Real ones.
The remaining is fake.
Good attempt @A_Kh , but check your work! I think you've allowed only those that end with '0' or 'X'. There are actually several more valid ISBNs in the collection 😉
@ChrisHemedinger wrote:
The algorithm for validating ISBN-10 values uses a checksum approach. Here are the steps:
....
- For the 10th digit, if the summed result is 11, use the number 0; if 10, use the letter X.
What would be the the 10th digit if the summed result is not in 10 or 11?
My fault for not making it clear! The final checksum digit is the difference calculated in the previous step. If it's a 2-digit difference: for 11, use 0; for 10, use X. I edited the problem for clarity.
Edited based on the clarification:
data validation;
set isbn;
array _nine [9] (10, 9, 8, 7, 6, 5, 4, 3, 2);
array nine [9];
do i=1 to 9;
nine{i}= input(substr(isbn,i, 1), best.)*_nine{i};
end;
_sum=sum(of nine:);
_remainder= mod(_sum, 11);
result= 11-_remainder;
if result eq 10 then checker= substr(isbn, 1, 9)||'X';
else if result eq 11 then checker= substr(isbn, 1, 9)||'0';
else checker= cats(substr(isbn, 1, 9), result);
if isbn=checker then isbn_validated= 'Yes';
else isbn_validated='No';
if isbn_validated='No';
keep title isbn:;
run;
proc print; run;
Fake books (N=19):
Here's another suggestion:
data;
set isbn;
s=0;
do j=1 to 9;
s+j*input(char(isbn,j),1.);
end;
m=mod(s,11);
c=put(ifn(m=j,.X,m),1.)=char(isbn,j);
put isbn c;
run;
The report is only suitable for internal purposes as it contains just the ISBNs and a "correctness flag" c, i.e., c=0 indicates fake ISBNs, c=1 correct ones:
0434488665 0 2018166516 0 9405643837 0 6032522768 1 4394205952 0 2353276079 0 6493135591 1 6776994355 0 2227835451 1 8018735913 1 0841779538 1 8730652341 0 1594122350 1 224320418X 0 6857923406 0 3967006111 0 9537581977 1 1283514257 1 566485052X 0 6994588902 0 9236137644 1 7649918275 0 458574645X 0 7601111520 1 1911465988 1 266671036X 1 4287030303 1 2561407012 1 6933010252 1 4278675463 0 351049735X 1 0566910450 0 9802436077 0 195432331X 1 8421179217 1 0979272564 1 3584795834 1 0713565068 0 9074601407 0 0168786583 1
I used a bit of algebra to simplify the formulas.
Of course, the code could be shortened further. For example:
However, the resulting DATA step (containing 108 characters) violates so many coding standards that it must be hidden behind a spoiler.
data;set;s=0;do j=1to 9;s+j*char(isbn,j);end;m=mod(s,11);c=put(ifn(m=j,.X,m),1.)=char(isbn,j);put _all_;run;
I know @yabwon is a fan of SAS Code Golf. And he gave a couple solutions in the Analytics Explorers thread. Maybe he'll take a swing at golfing this one.
@Quentin thanks for calling me out.
@FreelanceReinh thanks for the algebra, I forgot that we are in Z11 🙂
If we are already in the realm of "breaking good programming practice" your already "skinny" solution could be boiled down by few bytes more, to 92, like this:
data;set;s=0;do j=1to 9;s+j*char(isbn,j);end;c=min(char(isbn,j),10)=mod(s,11);put _all_;run;
Of course the quiet assumption is that the ISBN dataset was created directly before that step;
In the first test I was hoping to use >< operator ( char(isbn,j)><10
) but it's not "missing-value-proof"...
User friendly version:
data test;
set isbn;
s=0;
do j=1to 9;
s+j*char(isbn,j);
end;
c=min(char(isbn,j),10)=mod(s,11);
put _all_;
run;
The version I started with was using CALL POKELONG to split ISBN variable into an array of values, but all I could do was 116 bytes
data;set;array i[10]$1;CALL POKELONG(ISBN,ADDRLONG(i1),10);
s=0;do r=1to 9;s+i[r]*r;end;b=min(i[r],10)=mod(s,11);run;
and it was with no "put _all_;" in the code, wit PUT it's +10
User friendly version:
data test2;
set isbn;
array i[10]$1;
CALL POKELONG(ISBN,ADDRLONG(i1),10);
s=0;
do r=1to 9;
s+i[r]*r;
end;
b=min(i[r],10)=mod(s,11);
run;
For 32 bit sas it could be 8 bytes shorter, because we could use CALL POKE() and ADDR().
Will try to golf a bit more, but that 92 looks really hard.
Bart
@yabwon: Very clever, as usual! This is a substantial shortening (which comes at the price of not recognizing an "ISBN" like "266671036Y" as incorrect, but this was not a requirement). Other than the trivial replacement of the literal "10" by variable "j" (→ 91 characters) I don't see any room for further "improvement" at the moment.
dat;se;arra i[10]$1;CAL POKELONG(ISBN,ADDRLONG(i1),10);
s=0;do r=1to 9;s+i[r]*r;en;b=min(i[r],10)=mod(s,11);ru;
Is this legal?
You will need to correct the s
accumulation to
s + i[r] * (11-r) ; ;
or
s + i[10-r] * (r+1) ;
And the code will not classify as valid ISBNs having an X checksum (for example 000000014X
)
Math is ok, and the code works well with "x", for example
000000014X
195432331X
the firs one is marked as invalid and the second as valid ISBN.
data have;
input ISBN $10.;
cards;
000000014X
195432331X
run;
dat;se;arra i[10]$1;CAL POKELONG(ISBN,ADDRLONG(i1),10);
s=0;do r=1to 9;s+i[r]*r;en;b=min(i[r],10)=mod(s,11);ru;
proc print;
run;
result:
Bart
data want ;
set isbn ;
v = 0 ; keep title isbn v ;
do i = 1 to 9 ;
length c $1 ; c = substr(isbn,i,1) ; if notdigit(c) then return ;
vsum = sum(vsum,c*(11-i)) ;
end;
v = substr('1234567890X',11-mod(vsum,11)) =: substr(isbn,10,1) ;
run ;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.