BookmarkSubscribeRSS Feed
Walternate
Obsidian | Level 7

Hi,

 

I'm using the prxparse and prxmatch functions to check whether the structure of a 20-character string is valid. 

 

Here is a simplified example:

 checkit= prxparse("/[A-C][Z][3][1-4][3-5]\d{3}[5][2-7]/i");
match = (prxmatch(checkit, myvar)=1);

 

This works fine, but I have two questions:

 

1. Can I incorporate a numeric range spanning more than one character? For example, if the range was 399-401, could I do [399-401], or is that not allowed? I tried using test data and it didn't give an error but was not able to find matches, but maybe I'm missing something about the right way to set it up.

 

2. Adding even more complexity: What if I have two numeric ranges based on the values of a separate variable?

 

I can think of the kluge-y way to do this:

 

if var1 = 'a' then do;

       /*matching sequence with first range*/

end;

else if var1 = 'b' then do;

     /*matching sequence with second range*/

end;

 

Is there a better way, or is that the best I can do?

 

Thanks!

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hi @Walternate,


@Walternate wrote:

1. Can I incorporate a numeric range spanning more than one character? For example, if the range was 399-401, could I do [399-401], or is that not allowed? I tried using test data and it didn't give an error but was not able to find matches, but maybe I'm missing something about the right way to set it up.


No, you cannot specify numeric ranges like this. Perl regular expressions focus on characters. Ranges like [2-7] rely on the corresponding ranges of ASCII (or maybe EBCDIC) codes. For example, [Y-b] is a valid range on an ASCII system and includes (in addition to 'Y', 'Z', 'a' and 'b') the six special characters between 'Z' and 'a' in the ASCII collating sequence, e.g., the underscore. However, [b-Y] would be invalid since 'b'>'Y'. The regex [314-618] would be interpreted as "'3' or '1' or something in the (character!) range '4'-'6' (i.e., '4', '5' or '6') or '1' [redundant] or '8'." Your example [399-401] contains an invalid range (9-4) and I'm surprised to read that you didn't get error messages including "ERROR: Invalid [] range "9-4" ..." In most cases it would be cumbersome to construct a regex matching a range of integers.

 


@Walternate wrote:

2. Adding even more complexity: What if I have two numeric ranges based on the values of a separate variable?

 

I can think of the kluge-y way to do this:

 

if var1 = 'a' then do;

       /*matching sequence with first range*/

end;

else if var1 = 'b' then do;

     /*matching sequence with second range*/

end;

 

Is there a better way, or is that the best I can do?

As mentioned above, "numeric ranges" are difficult to implement correctly. I would rather extract the "numeric range" part of the string (e.g., with PRXPOSN or, in simple cases, with SUBSTR), convert it to an integer (INPUT function) and perform a numeric comparison like 399<=n<=401 as part of the matching process. Then it's easy to include a second range: PRXMATCH criterion & (var1='a' & first range check | var1='b' & second range check). Here, the "PRXMATCH criterion" would address the other parts of the string, excluding the "numeric range."

ChrisNZ
Tourmaline | Level 20

Nothing to add to @FreelanceReinh 's reply.

Note that your expression could be slightly simpler:

  prxparse("/[A-C]Z3[1-4][3-5]\d{3}5[2-7]/i")

 

For a small range like 399-401 you could search for (399|400|401)

ballardw
Super User

@Walternate wrote:

Hi,

 

 

2. Adding even more complexity: What if I have two numeric ranges based on the values of a separate variable?

 


Sounds like a job for IF/Then:

 

If var=<what ever value> (or If var in (<list of values)) then <the code for one search>;

Else if var in (<other list of values>) then <the code for the other search>;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1011 views
  • 2 likes
  • 4 in conversation