How do you select a range of characters in PROC SQL using the LIKE operator?
For example:
DATA test; INPUT n x $; cards; 1 abcd 2 1000 3 EF3H 4 _4*4 ; RUN; PROC SQL; SELECT n, x FROM test WHERE x LIKE "%[A-Z]%" ; QUIT;
Row 3 should be selected, but when I run it no rows are selected.
EDIT: my main goal is to select rows in which there are letters in the string. I get that you could do
if findc(lowcase(x),"abcdefghijklmnopqrstuvwxyz") > 0
in a data step, but just wondering if there's an equivalent in PROC SQL since I'm already using it to query a SQL database. Also I would want a flexible way of doing something like this, so maybe I'm looking for strings where any character could be between A and L, or J and X etc.
I read your edit ... it seems like you are searching for any uppercase letter anywhere in the string, but ignoring lowercase letters. If that's the case, this would be the key condition:
where compress(x, , 'kU') > ' ';
Within the COMPRESS function, the third parameter is specifying k=KEEP rather than remove and U=Uppercase letters.
If you want to locate any letter (whether uppercase or lowercase) it would be:
where compress(x, , 'ka') > ' ';
Of course, the SQL interpreter will need to understand how to use COMPRESS.
And if you want a subset of the letters, you would still have to spell them out:
where compress(x, 'ABCDEFGHIJ', 'k') > ' ';
You could use perl regular expressions. I would use a small trick however. Compress function can remove certain sets of characters, then you can check the length:
data test; input n x $; cards; 1 abcd 2 1000 3 EFGH 4 efgh ; run; proc sql; create table WANT as select * from TEST
where lengthn(compress(X,' ','a'))=0; quit;
What I do is remove all the alphanumeric characters and blanks (blank from the second parameter, all characters with the 'a' option - check out the SAS docs for other options). Then I take the length of the string and if it is zero then it only had those characters so we keep it. Its basically reversing the thinking.
As a note, don't use tabs in your code, use 2 spaces, otherwise it renders differently in different editors.
Ehh, tabs makes more sense since people like to see different indentation lengths. You could configure your editor to show however many spaces you like.
Anyway please see my edit. I'm trying to find rows in which the string has letter characters.
Update to my code to reflect your change in logic:
data test; input n x $; cards; 1 abcd 2 1000 3 EF3H 4 efgh ; run; proc sql; create table WANT as select * from TEST where lengthn(compress(X,' ','d')) >0; quit;
As for tabs, yes I could set my editors up to follow that logic, and change the setup of my browser to display them. However I amnot going to do that as I would forever be changing setups. Two spaces always render as two spaces, tabs render differently in every application, even posting your code above gives different results.
Ok, lets try this one then:
data test; input n x $; cards; 1 abcd 2 1000 3 EF3H 4 efgh ; run; proc sql; create table WANT as select * from TEST where findc(X,'a','a'); quit;
Find will return the first position of a character given in the second parameter, and I use the 'a' third parameter to add all alphabet to the list.
As long as you intend to examine the beginning of a character string, there is no need to use LIKE. You can take advantage of the fact that there are no characters between "A" and "Z" in an ASCII collating sequence:
where x >= 'A' and x < '[';
Note that "[" is the next character following "Z" in the ASCII collating sequence.
I read your edit ... it seems like you are searching for any uppercase letter anywhere in the string, but ignoring lowercase letters. If that's the case, this would be the key condition:
where compress(x, , 'kU') > ' ';
Within the COMPRESS function, the third parameter is specifying k=KEEP rather than remove and U=Uppercase letters.
If you want to locate any letter (whether uppercase or lowercase) it would be:
where compress(x, , 'ka') > ' ';
Of course, the SQL interpreter will need to understand how to use COMPRESS.
And if you want a subset of the letters, you would still have to spell them out:
where compress(x, 'ABCDEFGHIJ', 'k') > ' ';
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.