Help using Base SAS procedures

range of characters in PROC SQL where statement using LIKE operator?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

range of characters in PROC SQL where statement using LIKE operator?

[ Edited ]

How do you select a range of characters in PROC SQL using the LIKE operator?

 

For example:

 

DATA test;
	INPUT n x $;
	cards;
	1 abcd
	2 1000
	3 EF3H
	4 _4*4
	;
RUN;

PROC SQL;
	SELECT n, x
	FROM test
	WHERE x LIKE "%[A-Z]%"
	;
QUIT;

Row 3 should be selected, but when I run it no rows are selected.

 

EDIT: my main goal is to select rows in which there are letters in the string. I get that you could do

if findc(lowcase(x),"abcdefghijklmnopqrstuvwxyz") > 0 

in a data step, but just wondering if there's an equivalent in PROC SQL since I'm already using it to query a SQL database. Also I would want a flexible way of doing something like this, so maybe I'm looking for strings where any character could be between A and L, or J and X etc.

 


Accepted Solutions
Solution
‎01-19-2016 01:00 PM
Super User
Posts: 5,085

Re: range of characters in PROC SQL where statement using LIKE operator?

I read your edit ... it seems like you are searching for any uppercase letter anywhere in the string, but ignoring lowercase letters.  If that's the case, this would be the key condition:

 

where compress(x, , 'kU') > ' ';

 

Within the COMPRESS function, the third parameter is specifying k=KEEP rather than remove and U=Uppercase letters.

 

If you want to locate any letter (whether uppercase or lowercase) it would be:

 

where compress(x, , 'ka') > ' ';

 

Of course, the SQL interpreter will need to understand how to use COMPRESS.

 

And if you want a subset of the letters, you would still have to spell them out:

 

where compress(x, 'ABCDEFGHIJ', 'k') > ' ';

 

 

 

 

View solution in original post


All Replies
Super User
Super User
Posts: 7,404

Re: range of characters in PROC SQL where statement using LIKE operator?

[ Edited ]

You could use perl regular expressions.  I would use a small trick however.  Compress function can remove certain sets of characters, then you can check the length:

data test;
  input n x $;
cards;
1 abcd
2 1000
3 EFGH
4 efgh
;
run;

proc sql;
  create table WANT as
  select *
  from   TEST	
where lengthn(compress(X,' ','a'))=0; quit;

What I do is remove all the alphanumeric characters and blanks (blank from the second parameter, all characters with the 'a' option - check out the SAS docs for other options).  Then I take the length of the string and if it is zero then it only had those characters so we keep it.  Its basically reversing the thinking.

 

As a note, don't use tabs in your code, use 2 spaces, otherwise it renders differently in different editors.

Occasional Contributor
Posts: 7

Re: range of characters in PROC SQL where statement using LIKE operator?

Ehh, tabs makes more sense since people like to see different indentation lengths. You could configure your editor to show however many spaces you like.

 

Anyway please see my edit. I'm trying to find rows in which the string has letter characters.

Super User
Super User
Posts: 7,404

Re: range of characters in PROC SQL where statement using LIKE operator?

Update to my code to reflect your change in logic:

data test;
  input n x $;
cards;
1 abcd
2 1000
3 EF3H
4 efgh
;
run;

proc sql;
  create table WANT as
  select *
  from   TEST
  where  lengthn(compress(X,' ','d')) >0;
quit;

As for tabs, yes I could set my editors up to follow that logic, and change the setup of my browser to display them.  However I amnot going to do that as I would forever be changing setups.  Two spaces always render as two spaces, tabs render differently in every application, even posting your code above gives different results.

Occasional Contributor
Posts: 7

Re: range of characters in PROC SQL where statement using LIKE operator?

I don't think this will work though. What if there are symbols like period or underscore, etc.

In any case, I'm not specifically looking for solutions that has all alphabetic characters. That's just one example. I would want a solution that would work if I only wanted to search for string that contains any letter between say 'c' and 'k', etc.
Super User
Super User
Posts: 7,404

Re: range of characters in PROC SQL where statement using LIKE operator?

Ok, lets try this one then:

data test;
  input n x $;
cards;
1 abcd
2 1000
3 EF3H
4 efgh
;
run;

proc sql;
  create table WANT as
  select *
  from   TEST
  where  findc(X,'a','a');
quit;

Find will return the first position of a character given in the second parameter, and I use the 'a' third parameter to add all alphabet to the list.

Super User
Posts: 5,085

Re: range of characters in PROC SQL where statement using LIKE operator?

As long as  you intend to examine the beginning of a character string, there is no need to use LIKE.  You can take advantage of the fact that there are no characters between "A" and "Z" in an ASCII collating sequence:

 

where x >= 'A' and x < '[';

 

Note that "[" is the next character following "Z" in the ASCII collating sequence.

Occasional Contributor
Posts: 7

Re: range of characters in PROC SQL where statement using LIKE operator?

Please see my edit. I'm not just searching for the first character, but in any character in the string where there's an alphabetic character.
Solution
‎01-19-2016 01:00 PM
Super User
Posts: 5,085

Re: range of characters in PROC SQL where statement using LIKE operator?

I read your edit ... it seems like you are searching for any uppercase letter anywhere in the string, but ignoring lowercase letters.  If that's the case, this would be the key condition:

 

where compress(x, , 'kU') > ' ';

 

Within the COMPRESS function, the third parameter is specifying k=KEEP rather than remove and U=Uppercase letters.

 

If you want to locate any letter (whether uppercase or lowercase) it would be:

 

where compress(x, , 'ka') > ' ';

 

Of course, the SQL interpreter will need to understand how to use COMPRESS.

 

And if you want a subset of the letters, you would still have to spell them out:

 

where compress(x, 'ABCDEFGHIJ', 'k') > ' ';

 

 

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 1464 views
  • 2 likes
  • 3 in conversation