BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
CJ_Jackson
Fluorite | Level 6

How do you select a range of characters in PROC SQL using the LIKE operator?

 

For example:

 

DATA test;
	INPUT n x $;
	cards;
	1 abcd
	2 1000
	3 EF3H
	4 _4*4
	;
RUN;

PROC SQL;
	SELECT n, x
	FROM test
	WHERE x LIKE "%[A-Z]%"
	;
QUIT;

Row 3 should be selected, but when I run it no rows are selected.

 

EDIT: my main goal is to select rows in which there are letters in the string. I get that you could do

if findc(lowcase(x),"abcdefghijklmnopqrstuvwxyz") > 0 

in a data step, but just wondering if there's an equivalent in PROC SQL since I'm already using it to query a SQL database. Also I would want a flexible way of doing something like this, so maybe I'm looking for strings where any character could be between A and L, or J and X etc.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

I read your edit ... it seems like you are searching for any uppercase letter anywhere in the string, but ignoring lowercase letters.  If that's the case, this would be the key condition:

 

where compress(x, , 'kU') > ' ';

 

Within the COMPRESS function, the third parameter is specifying k=KEEP rather than remove and U=Uppercase letters.

 

If you want to locate any letter (whether uppercase or lowercase) it would be:

 

where compress(x, , 'ka') > ' ';

 

Of course, the SQL interpreter will need to understand how to use COMPRESS.

 

And if you want a subset of the letters, you would still have to spell them out:

 

where compress(x, 'ABCDEFGHIJ', 'k') > ' ';

 

 

 

 

View solution in original post

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

You could use perl regular expressions.  I would use a small trick however.  Compress function can remove certain sets of characters, then you can check the length:

data test;
  input n x $;
cards;
1 abcd
2 1000
3 EFGH
4 efgh
;
run;

proc sql;
  create table WANT as
  select *
  from   TEST	
where lengthn(compress(X,' ','a'))=0; quit;

What I do is remove all the alphanumeric characters and blanks (blank from the second parameter, all characters with the 'a' option - check out the SAS docs for other options).  Then I take the length of the string and if it is zero then it only had those characters so we keep it.  Its basically reversing the thinking.

 

As a note, don't use tabs in your code, use 2 spaces, otherwise it renders differently in different editors.

CJ_Jackson
Fluorite | Level 6

Ehh, tabs makes more sense since people like to see different indentation lengths. You could configure your editor to show however many spaces you like.

 

Anyway please see my edit. I'm trying to find rows in which the string has letter characters.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Update to my code to reflect your change in logic:

data test;
  input n x $;
cards;
1 abcd
2 1000
3 EF3H
4 efgh
;
run;

proc sql;
  create table WANT as
  select *
  from   TEST
  where  lengthn(compress(X,' ','d')) >0;
quit;

As for tabs, yes I could set my editors up to follow that logic, and change the setup of my browser to display them.  However I amnot going to do that as I would forever be changing setups.  Two spaces always render as two spaces, tabs render differently in every application, even posting your code above gives different results.

CJ_Jackson
Fluorite | Level 6
I don't think this will work though. What if there are symbols like period or underscore, etc.

In any case, I'm not specifically looking for solutions that has all alphabetic characters. That's just one example. I would want a solution that would work if I only wanted to search for string that contains any letter between say 'c' and 'k', etc.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ok, lets try this one then:

data test;
  input n x $;
cards;
1 abcd
2 1000
3 EF3H
4 efgh
;
run;

proc sql;
  create table WANT as
  select *
  from   TEST
  where  findc(X,'a','a');
quit;

Find will return the first position of a character given in the second parameter, and I use the 'a' third parameter to add all alphabet to the list.

Astounding
PROC Star

As long as  you intend to examine the beginning of a character string, there is no need to use LIKE.  You can take advantage of the fact that there are no characters between "A" and "Z" in an ASCII collating sequence:

 

where x >= 'A' and x < '[';

 

Note that "[" is the next character following "Z" in the ASCII collating sequence.

CJ_Jackson
Fluorite | Level 6
Please see my edit. I'm not just searching for the first character, but in any character in the string where there's an alphabetic character.
Astounding
PROC Star

I read your edit ... it seems like you are searching for any uppercase letter anywhere in the string, but ignoring lowercase letters.  If that's the case, this would be the key condition:

 

where compress(x, , 'kU') > ' ';

 

Within the COMPRESS function, the third parameter is specifying k=KEEP rather than remove and U=Uppercase letters.

 

If you want to locate any letter (whether uppercase or lowercase) it would be:

 

where compress(x, , 'ka') > ' ';

 

Of course, the SQL interpreter will need to understand how to use COMPRESS.

 

And if you want a subset of the letters, you would still have to spell them out:

 

where compress(x, 'ABCDEFGHIJ', 'k') > ' ';

 

 

 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 7209 views
  • 2 likes
  • 3 in conversation