Thanks @FreelanceReinh - fixed now.
Unlike production code, blog posts and community messages are easy to fix 🙂
@FreelanceReinh Sir, Sorry for the bother as it is very trivial. However,
The code results in 0 when the value is blank/missing and since we are after highest digit, my understanding is the result should be blank too. So, small tweak required?
data _null_;
str='000112010302';
b=findc('123456789',str,-9);
put b= ;
str='000';
b=findc('123456789',str,-9);
put b= ;
call missing(str); ?/*this part as @Tom pointed out earlier*/
b=findc('123456789',str,-9);
put b= ;
run;
@novinosrin Sir, I think that Tom's variant of the code handles the situation of a blank input string nicely (at the cost of only two more characters!), even though it returns -1 in this case, not a missing value. My understanding of the title, "highest digit in a string of digits", was that it's all about a string of digits. In a real-world application, as opposed to a coding puzzle, a solution would surely need to be more robust against missing values and other non-digit characters. Brevity and elegance of the code would be less important.
Thanks for the quick corrections. (One leading zero is still missing: in the %LET statement.)
Chris,
Brevity is important, especially since it is the stated goal of the challenge. But there's another aspect. Golf is little fun if you have to crawl from hole to hole, even if you end up well under par. Here, too, it matters how fast different approaches get there. In this respect, not only FreelanceReinhard's method is the tersest but it outperforms any other technique hands down. I had suspected it was the case because SAS string search functions are based on an extremely rapid underlying algorithm - but didn't know by how much before testing it. The step below tests the FINDC approach, the APP approach, and the loop approach.
The result is that the run times for the four methods (in the order listed above) are related as 1:10:15. Ten times is a pretty astounding difference between FINDC and the next fastest method, the APP. The latter would probably fare faster if it didn't have to do the implicit character-to-numeric type conversions. It also shows how relatively slow the combination of looping and the CHAR function is.
On a different note, I see nothing inherently "dangerous" in CALL POKE(LONG) and have used it to speed things up (including in production under W, U/L, and z/OS) since at least 1998 (and even penned a few papers on the subject). At any rate, it's totally safe to write to an address already occupied by an element of a temp array - which also guarantees that address-wise, adjacent elements abut each other in physical memory. (Note how this circumstance is exploited below to populate all bytes of N L-long elements of a temp array with random digits without a need to loop through the array itself.)
Best regards
Paul D.
%let N = 100 ; /* Number of random strings to test */
%let L = 32 ; /* Length of each string */
%let R = 10000 ; /* Number of test repetitions */
data _null_ ;
array ss [&N] $ &L _temporary_ ;
/* Populate each L-long string with random digits from 0 to 9 */
call streaminit (7) ;
x = addrlong (ss[1]) ;
do j = 1 to &N * &L ;
call pokelong (put (rand ("uniform", 0, 9), 1.), ptrlongadd (x, j - 1), 1) ;
end ;
/* FINDC approach */
t = time() ;
do r = 1 to &R ;
do j = 1 to &N ;
max = findc ("123456789", ss[j], -9) ;
end ;
end ;
t1 = time() - t ;
/* APP approach - note implicit type conversions */
t = time() ;
array cc [&L] $ 1 _temporary_ ;
do r = 1 to &R ;
do j = 1 to &N ;
call pokelong (ss[j], addrlong (cc[1]), &L) ;
max = max (of cc[*]) ;
end ;
end ;
t2 = time() - t ;
/* Loop approach - $1-to-$1 comparison with conversion */
t = time() ;
do r = 1 to &R ;
do j = 1 to &N ;
cm = "0" ;
do i = 1 to &L ;
cm = cm <> char (ss[j], i) ;
end ;
max = input (cm, 1.) ;
end ;
end ;
t3 = time() - t ;
rel_time = catx (":", 1, round (t2/t1), round (t3/t1)) ;
put rel_time ;
run ;
@ChrisHemedinger: I'm a bit late to the discussion but, since the goal was to produce code that required the fewest keystrokes, I'm surprised that no one took advantage of SAS's automatic spelling correction that goes on when code is submitted.
I liked one of @Tom's entries, thus shortened his code by three keystrokes:
dat have;
inpu k $12.;
card;
000112010302
0
1
2
3
4
5
6
7
8
9
0123456987
.
;
dat w2;
se have;
h=10-findc('9876543210 ',k);
ru;
Art, CEO, AnalystFinder.com
Creative move, @art297!
For me, code golf is about hitting the mark in the fewest number of moves (statements, functions, etc) and not worrying about the economy of characters in the code. Otherwise we're all worrying about using the functions with the shortest names, skipping semicolons where not strictly necessary, etc.
@ChrisHemedinger: I disagree as the ultimate measure in golf is strokes. Method, in golf, only counts if it reduces strokes. However, it's your game, thus you get to define the rules.
Regardless, while not very readable, here (for those who are interested in keystroke reduction) is my latest (and I think final) shortened version of Tom's entry:
dat have;
inpu k $12.;
card;
000112010302
0
1
2
3
4
5
6
7
8
9
0123456987
.
;
dat w;
se;
h=10-findc('9876543210 ',k);
ru;
Art, CEO, AnalystFinder.com
Hi Chris,
I know that I'm a bit late in the discussion and I can't compete with presented solutions but I didn't find one with "@@" so let me add one (Paul's loop solution was inspiration).
data _null_;
c = '000112010302';
_infile_=c;
input x 1. @@; retain max .;
max= x <> max;
put +(-6) "max=" max @;
cards;
;
run;
all the best
Bart J.
Bart,
Very interesting. Could be especially useful if the data were read inistream from a text file and processed length(c) bytes at a time since then it would make substringing from _infile_ unnecessary. Time from time we need to be reminded of the power of the INPUT statement.
Best
Paul D.
Hi Paul,
Thanks for kind words.
Bart J.
P.S. If string can be read in from CARDS and if you don't care about empty string (max will be 0) the code can be even shorter (no explicit _INFILE_ and no RETAIN needed):
data _null_;
input x 1. @@;
max + (-max + (x <> max));
put +(-6) "max=" max @;
cards;
000112010302
;
run;
You can shorten the one that uses COMPRESS().
If you force the target variable to be a character variable of length one you can eliminate the FIRST() function call.
As long as you use a nice short variable name it will save a couple of bytes.
a=first(compress('9876543210','000112010302','k'));
b='';b=compress('9876543210','000112010302','k');
Or set the new variable to the letter K and use it as the third argument to COMPRESS() and save one more character.
c='k';c=compress('9876543210','000112010302',c);
how about this?
data _null_;
c = '000112010302' ;
retain max 0;
do i = 1 to length(c);
x = rank(char(c,i)) - 48;
max = max(max,x);
end;
put max =;
run;
@KachiM While your code looks nice and fancy, it actually deviates from the objective and scope of the discussion topic aka "golf" i.e the least code to results. Scope creep is not the objective IMHO
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.