- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I need to validate data values before generating an XML file. The XML file specifies patterns for a number of the fields. But they are not standard PERL regex's. XSD patterns are apparently a limited subset and the syntax is slightly different.
If you plug an XSD pattern into prxmatch, it rejects the syntax.
I did find a reference that said you could take an XSD pattern and prefix it with a ^ and suffix it with a $ to convert it to a Perl regex. But prxpattern rejects that also. If you remove the $, it rejects it because it wants a close ^. Adding a ^ as the first and last character results in (for the examples I have checked) a valid pattern. But it seems to accept patterns that are not valid.
So does anybody have any advice on how to convert an XSD pattern to something that can be used in SAS.
TIA.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Seems like all you need to add is a pair of delimiters
\d{5} --> /\d{5}/ will match a substring of 5 digits
or
\d{5} --> /^\d{5}$/ will match if the whole string is 5 digits
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Give examples of XSD patterns.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Meant to include a representative set. Thanks for the reminder. Here is a subset from just one of the XSD files.
[A-Z\d\._'\-]+@[A-Z\d_'\-]+\.[A-Z\d\._'\-]+
[A-Z]{2}
[A-Z]{4}\d{6}[MH][A-Z]{5}[0-9]{2}
[A-ZÑ ]{1,200}
[A-ZÑ&]{3,4}\d{6}[A-Z0-9]{3}
[A-ZÑ&]{3}\d{6}[A-Z0-9]{3}
[A-ZÑ&]{4}\d{6}[A-Z0-9]{3}
[A-ZÑ0-9]{1,14}
[A-ZÑ\d #\-\.&,_@'()]{1,254}
[A-ZÑ\d \-\.,':/$]{1,3000}
[A-ZÑ\d \-\.,:/]{1,100}
[A-ZÑ\d \-_\.&,'#@]{1,200}
\d{1,14}\.\d{2}
\d{1,2}
\d{4}[0|1]\d{1}
\d{4}\-\d{1,9}
\d{5}
Found this site that discusses the differences. I tried the suggestion to prefix the pattern with a ^ and suffix it with a $. But that did not create an expression that prxparse accepted.
Also found a few sites that decode the pattern into a description. From which I could presumably create a valid perl regex expression. But given how many of these I have to create, I would prefer to avoid that approach if at all possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Seems like all you need to add is a pair of delimiters
\d{5} --> /\d{5}/ will match a substring of 5 digits
or
\d{5} --> /^\d{5}$/ will match if the whole string is 5 digits
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Very helpful. Thx. So let me first acknowledge that to say I am a novice on regex patterns would give me too much credit.
So, am I correct in assuming that prefixing with /^ and suffixing with $/ will check for an exact match. So for example, 12345x, will fail because it is not an exact match?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Right! "12345 " will not match either because of the trailing spaces, but trim("12345 ") will match.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. I had already thought about that and was doing a strip of the string.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Note that if you want to match accented letters, the pattern
[A-ZÑ0-9]{1,14}
can be extended using a posix character class
[[:upper:]0-9]{1,14}
if you need to catch other accented letters.
The letters matched depend on the encoding. For example wlatin1 matches most Western Europe accents like Ñ (Spanish) or Ø (Swedish).
Taken from http://www.amazon.com/High-Performance-SAS-Coding-Christian-Graffeuille/dp/1512397490
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. For now this project is using UTF-8 encoding and we only need to support Spanish. But this is a good tip.