Solved: range of characters in PROC SQL where statement using LIKE operator?

CJ_Jackson · Posted 01-19-2016 11:31 AM

How do you select a range of characters in PROC SQL using the LIKE operator?

For example:

DATA test;
	INPUT n x $;
	cards;
	1 abcd
	2 1000
	3 EF3H
	4 _4*4
	;
RUN;

PROC SQL;
	SELECT n, x
	FROM test
	WHERE x LIKE "%[A-Z]%"
	;
QUIT;

Row 3 should be selected, but when I run it no rows are selected.

EDIT: my main goal is to select rows in which there are letters in the string. I get that you could do

if findc(lowcase(x),"abcdefghijklmnopqrstuvwxyz") > 0

in a data step, but just wondering if there's an equivalent in PROC SQL since I'm already using it to query a SQL database. Also I would want a flexible way of doing something like this, so maybe I'm looking for strings where any character could be between A and L, or J and X etc.

Astounding · Posted 01-19-2016 12:54 PM

I read your edit ... it seems like you are searching for any uppercase letter anywhere in the string, but ignoring lowercase letters. If that's the case, this would be the key condition:

where compress(x, , 'kU') > ' ';

Within the COMPRESS function, the third parameter is specifying k=KEEP rather than remove and U=Uppercase letters.

If you want to locate any letter (whether uppercase or lowercase) it would be:

where compress(x, , 'ka') > ' ';

Of course, the SQL interpreter will need to understand how to use COMPRESS.

And if you want a subset of the letters, you would still have to spell them out:

where compress(x, 'ABCDEFGHIJ', 'k') > ' ';

View solution in original post

RW9 · Posted 01-19-2016 11:38 AM

You could use perl regular expressions. I would use a small trick however. Compress function can remove certain sets of characters, then you can check the length:

data test;
  input n x $;
cards;
1 abcd
2 1000
3 EFGH
4 efgh
;
run;

proc sql;
  create table WANT as
  select *
  from   TEST	
  where  lengthn(compress(X,' ','a'))=0;
quit;

What I do is remove all the alphanumeric characters and blanks (blank from the second parameter, all characters with the 'a' option - check out the SAS docs for other options). Then I take the length of the string and if it is zero then it only had those characters so we keep it. Its basically reversing the thinking.

As a note, don't use tabs in your code, use 2 spaces, otherwise it renders differently in different editors.

CJ_Jackson · Posted 01-19-2016 12:02 PM

Ehh, tabs makes more sense since people like to see different indentation lengths. You could configure your editor to show however many spaces you like.

Anyway please see my edit. I'm trying to find rows in which the string has letter characters.

RW9 · Posted 01-19-2016 12:07 PM

Update to my code to reflect your change in logic:

data test;
  input n x $;
cards;
1 abcd
2 1000
3 EF3H
4 efgh
;
run;

proc sql;
  create table WANT as
  select *
  from   TEST
  where  lengthn(compress(X,' ','d')) >0;
quit;

As for tabs, yes I could set my editors up to follow that logic, and change the setup of my browser to display them. However I amnot going to do that as I would forever be changing setups. Two spaces always render as two spaces, tabs render differently in every application, even posting your code above gives different results.

CJ_Jackson · Posted 01-19-2016 12:16 PM

I don't think this will work though. What if there are symbols like period or underscore, etc.

In any case, I'm not specifically looking for solutions that has all alphabetic characters. That's just one example. I would want a solution that would work if I only wanted to search for string that contains any letter between say 'c' and 'k', etc.

RW9 · Posted 01-19-2016 12:23 PM

Ok, lets try this one then:

data test;
  input n x $;
cards;
1 abcd
2 1000
3 EF3H
4 efgh
;
run;

proc sql;
  create table WANT as
  select *
  from   TEST
  where  findc(X,'a','a');
quit;

Find will return the first position of a character given in the second parameter, and I use the 'a' third parameter to add all alphabet to the list.

Astounding · Posted 01-19-2016 11:48 AM

As long as you intend to examine the beginning of a character string, there is no need to use LIKE. You can take advantage of the fact that there are no characters between "A" and "Z" in an ASCII collating sequence:

where x >= 'A' and x < '[';

Note that "[" is the next character following "Z" in the ASCII collating sequence.

CJ_Jackson · Posted 01-19-2016 12:18 PM

Please see my edit. I'm not just searching for the first character, but in any character in the string where there's an alphabetic character.

Astounding · Posted 01-19-2016 12:54 PM

I read your edit ... it seems like you are searching for any uppercase letter anywhere in the string, but ignoring lowercase letters. If that's the case, this would be the key condition:

where compress(x, , 'kU') > ' ';

Within the COMPRESS function, the third parameter is specifying k=KEEP rather than remove and U=Uppercase letters.

If you want to locate any letter (whether uppercase or lowercase) it would be:

where compress(x, , 'ka') > ' ';

Of course, the SQL interpreter will need to understand how to use COMPRESS.

And if you want a subset of the letters, you would still have to spell them out:

where compress(x, 'ABCDEFGHIJ', 'k') > ' ';

range of characters in PROC SQL where statement using LIKE operator?

Re: range of characters in PROC SQL where statement using LIKE operator?

Re: range of characters in PROC SQL where statement using LIKE operator?

Re: range of characters in PROC SQL where statement using LIKE operator?

Re: range of characters in PROC SQL where statement using LIKE operator?

Re: range of characters in PROC SQL where statement using LIKE operator?

Re: range of characters in PROC SQL where statement using LIKE operator?

Re: range of characters in PROC SQL where statement using LIKE operator?

Re: range of characters in PROC SQL where statement using LIKE operator?

Re: range of characters in PROC SQL where statement using LIKE operator?

Registration is open

Registration is open

SAS Training: Just a Click Away