SAS Programming

DonH · Posted 05-05-2016 10:15 AM

I need to validate data values before generating an XML file. The XML file specifies patterns for a number of the fields. But they are not standard PERL regex's. XSD patterns are apparently a limited subset and the syntax is slightly different.

If you plug an XSD pattern into prxmatch, it rejects the syntax.

I did find a reference that said you could take an XSD pattern and prefix it with a ^ and suffix it with a $ to convert it to a Perl regex. But prxpattern rejects that also. If you remove the $, it rejects it because it wants a close ^. Adding a ^ as the first and last character results in (for the examples I have checked) a valid pattern. But it seems to accept patterns that are not valid.

So does anybody have any advice on how to convert an XSD pattern to something that can be used in SAS.

TIA.

PGStats · Posted 05-05-2016 04:54 PM

Seems like all you need to add is a pair of delimiters

\d{5} --> /\d{5}/ will match a substring of 5 digits

or

\d{5} --> /^\d{5}$/ will match if the whole string is 5 digits

PG

View solution in original post

PGStats · Posted 05-05-2016 03:17 PM

Give examples of XSD patterns.

PG

DonH · Posted 05-05-2016 03:56 PM

Meant to include a representative set. Thanks for the reminder. Here is a subset from just one of the XSD files.

[A-Z\d\._'\-]+@[A-Z\d_'\-]+\.[A-Z\d\._'\-]+
[A-Z]{2}
[A-Z]{4}\d{6}[MH][A-Z]{5}[0-9]{2}
[A-ZÑ ]{1,200}
[A-ZÑ&]{3,4}\d{6}[A-Z0-9]{3}
[A-ZÑ&]{3}\d{6}[A-Z0-9]{3}
[A-ZÑ&]{4}\d{6}[A-Z0-9]{3}
[A-ZÑ0-9]{1,14}
[A-ZÑ\d #\-\.&,_@'()]{1,254}
[A-ZÑ\d \-\.,':/$]{1,3000}
[A-ZÑ\d \-\.,:/]{1,100}
[A-ZÑ\d \-_\.&,'#@]{1,200}
\d{1,14}\.\d{2}
\d{1,2}
\d{4}[0|1]\d{1}
\d{4}\-\d{1,9}
\d{5}

Found this site that discusses the differences. I tried the suggestion to prefix the pattern with a ^ and suffix it with a $. But that did not create an expression that prxparse accepted.

Also found a few sites that decode the pattern into a description. From which I could presumably create a valid perl regex expression. But given how many of these I have to create, I would prefer to avoid that approach if at all possible.

PGStats · Posted 05-05-2016 04:54 PM

Seems like all you need to add is a pair of delimiters

\d{5} --> /\d{5}/ will match a substring of 5 digits

or

\d{5} --> /^\d{5}$/ will match if the whole string is 5 digits

PG

DonH · Posted 05-05-2016 04:58 PM

Very helpful. Thx. So let me first acknowledge that to say I am a novice on regex patterns would give me too much credit.

So, am I correct in assuming that prefixing with /^ and suffixing with $/ will check for an exact match. So for example, 12345x, will fail because it is not an exact match?

PGStats · Posted 05-05-2016 05:31 PM

Right! "12345 " will not match either because of the trailing spaces, but trim("12345 ") will match.

PG

DonH · Posted 05-05-2016 05:32 PM

Thanks. I had already thought about that and was doing a strip of the string.

ChrisNZ · Posted 05-05-2016 09:48 PM

Note that if you want to match accented letters, the pattern

[A-ZÑ0-9]{1,14}

can be extended using a posix character class

[[:upper:]0-9]{1,14}

if you need to catch other accented letters.

The letters matched depend on the encoding. For example wlatin1 matches most Western Europe accents like Ñ (Spanish) or Ø (Swedish).

Taken from http://www.amazon.com/High-Performance-SAS-Coding-Christian-Graffeuille/dp/1512397490

High-Performance SAS Coding - Third Edition

DonH · Posted 05-06-2016 07:36 AM

Thanks. For now this project is using UTF-8 encoding and we only need to support Spanish. But this is a good tip.

SAS Programming

Can prxpattern and prxmatch be used with XSD patterns

Re: Can prxpattern and prxmatch be used with XSD patterns

Re: Can prxpattern and prxmatch be used with XSD patterns

Re: Can prxpattern and prxmatch be used with XSD patterns

Re: Can prxpattern and prxmatch be used with XSD patterns

Re: Can prxpattern and prxmatch be used with XSD patterns

Re: Can prxpattern and prxmatch be used with XSD patterns

Re: Can prxpattern and prxmatch be used with XSD patterns

Re: Can prxpattern and prxmatch be used with XSD patterns

Re: Can prxpattern and prxmatch be used with XSD patterns

IF prxmatch('/prx-pattern/', catx(' ', of charvar1-charvar100)) vs. W...

Spaces in prxmatch function

Market Basket Analysis (Part 1) Understanding Frequent-Pattern Growth ...

PRXMATCH with string containing ampersand

SAS Viya simplified deployment patterns

Follow Us

What is...

SAS Programming

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...