- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear SAS Community,
I have a string variable in my data, which contains a mixture of different alphanumeric/special characters (really everything is possible). My task is to identify only special cases (=valid cases). Here is a minimal working example:
Example Data:
var1
Abc 123456
Abc 1234567
B aBc 123
A
AbC 123
aBC 1243
123 abc 123
aBc123
Abc 12345 Abc 12345
abc 345
I only would like to find the highlighted cases, starting with three letters (upper or lower case possible) and followed by numbers (one is required, a max. of six is possible). Between the letters and the numbers one space is allowed but not required. I have tried for example the following code:
DATA WORK.DATA_02;
SET WORK.DATA_01;
FOUND = 0;
if PRXMATCH ("/^[Aa][Bb][Cc] ?[1-9][0-9]?[0-9]?[0-9]?[0-9]?[0-9]?$/",var1) > 0 then FOUND = 1;
RUN;
Beside that I have also tried the code without the $ at the end, with different boundaries \b and hundreds of other combinations. Unfortunately nothing works... The strange thing is that it seems that the code
^([Aa][Bb][Cc]\ ?[0-9][0-9]?[0-9]?[0-9]?[0-9]?[0-9]?)$
works in many online regex-checkers, but not within SAS.
Can anybody help? Any hints, ideas or solutions? I am really desperately looking for an answer since days...
Thank you very much in advance!
Best regards
Lars
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Note that [:alpha:] matches all letters in the collating sequence used, including for example é.
This might be better. Or not.
prxmatch('/^[a-z]{3} ?\d{1,6}$/oi',strip(VAR))
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I believe below RegEx will meet your requirement.
You might find this link helpful.
data have;
infile datalines truncover dlm='|' dsd;
input select :$1. var1 :$40.;
datalines;
1|Abc 123456
0|Abc 1234567
0|B aBc 123
0|A
0|AbC 123
1|aBC 1243
0|123 abc 123
1|aBc123
0|Abc 12345 Abc 12345
1|abc 345
;
/*I only would like to find the highlighted cases, starting with three letters (upper or lower case possible) */
/*and followed by numbers (one is required, a max. of six is possible). */
/*Between the letters and the numbers one space is allowed but not required.*/
data want;
set have;
selected= prxmatch('/^[[:alpha:]]{3} ?\d{1,6}$/oi',strip(var1))>0;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Note that [:alpha:] matches all letters in the collating sequence used, including for example é.
This might be better. Or not.
prxmatch('/^[a-z]{3} ?\d{1,6}$/oi',strip(VAR))
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi Patrick,
thank you very much for your fast reply and the additional link. This really helped me a lot and I think this is the solution!
Thanks! 😀
Lars