Hi,
I'm using the prxparse and prxmatch functions to check whether the structure of a 20-character string is valid.
Here is a simplified example:
checkit= prxparse("/[A-C][Z][3][1-4][3-5]\d{3}[5][2-7]/i");
match = (prxmatch(checkit, myvar)=1);
This works fine, but I have two questions:
1. Can I incorporate a numeric range spanning more than one character? For example, if the range was 399-401, could I do [399-401], or is that not allowed? I tried using test data and it didn't give an error but was not able to find matches, but maybe I'm missing something about the right way to set it up.
2. Adding even more complexity: What if I have two numeric ranges based on the values of a separate variable?
I can think of the kluge-y way to do this:
if var1 = 'a' then do;
/*matching sequence with first range*/
end;
else if var1 = 'b' then do;
/*matching sequence with second range*/
end;
Is there a better way, or is that the best I can do?
Thanks!
Hi @Walternate,
@Walternate wrote:
1. Can I incorporate a numeric range spanning more than one character? For example, if the range was 399-401, could I do [399-401], or is that not allowed? I tried using test data and it didn't give an error but was not able to find matches, but maybe I'm missing something about the right way to set it up.
No, you cannot specify numeric ranges like this. Perl regular expressions focus on characters. Ranges like [2-7] rely on the corresponding ranges of ASCII (or maybe EBCDIC) codes. For example, [Y-b] is a valid range on an ASCII system and includes (in addition to 'Y', 'Z', 'a' and 'b') the six special characters between 'Z' and 'a' in the ASCII collating sequence, e.g., the underscore. However, [b-Y] would be invalid since 'b'>'Y'. The regex [314-618] would be interpreted as "'3' or '1' or something in the (character!) range '4'-'6' (i.e., '4', '5' or '6') or '1' [redundant] or '8'." Your example [399-401] contains an invalid range (9-4) and I'm surprised to read that you didn't get error messages including "ERROR: Invalid [] range "9-4" ..." In most cases it would be cumbersome to construct a regex matching a range of integers.
@Walternate wrote:
2. Adding even more complexity: What if I have two numeric ranges based on the values of a separate variable?
I can think of the kluge-y way to do this:
if var1 = 'a' then do;
/*matching sequence with first range*/
end;
else if var1 = 'b' then do;
/*matching sequence with second range*/
end;
Is there a better way, or is that the best I can do?
As mentioned above, "numeric ranges" are difficult to implement correctly. I would rather extract the "numeric range" part of the string (e.g., with PRXPOSN or, in simple cases, with SUBSTR), convert it to an integer (INPUT function) and perform a numeric comparison like 399<=n<=401 as part of the matching process. Then it's easy to include a second range: PRXMATCH criterion & (var1='a' & first range check | var1='b' & second range check). Here, the "PRXMATCH criterion" would address the other parts of the string, excluding the "numeric range."
Nothing to add to @FreelanceReinh 's reply.
Note that your expression could be slightly simpler:
prxparse("/[A-C]Z3[1-4][3-5]\d{3}5[2-7]/i")
For a small range like 399-401 you could search for (399|400|401)
@Walternate wrote:
Hi,
2. Adding even more complexity: What if I have two numeric ranges based on the values of a separate variable?
Sounds like a job for IF/Then:
If var=<what ever value> (or If var in (<list of values)) then <the code for one search>;
Else if var in (<other list of values>) then <the code for the other search>;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.