BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
DonH
Lapis Lazuli | Level 10

I need to validate data values before generating an XML file. The XML file specifies patterns for a number of the fields. But they are not standard PERL regex's. XSD patterns are apparently a limited subset and the syntax is slightly different.

 

If you plug an XSD pattern into prxmatch, it rejects the syntax.

I did find a reference that said you could take an XSD pattern and prefix it with a ^ and suffix it with a $ to convert it to a Perl regex. But prxpattern rejects that also. If you remove the $, it rejects it because it wants a close ^. Adding a ^ as the first and last character results in (for the examples I have checked) a valid pattern. But it seems to accept patterns that are not valid.

 

So does anybody have any advice on how to convert an XSD pattern to something that can be used in SAS.

 

TIA.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Seems like all you need to add is a pair of delimiters

 

\d{5}   -->  /\d{5}/  will match a substring of 5 digits

 

or 

 

\d{5} --> /^\d{5}$/ will match if the whole string is 5 digits

PG

View solution in original post

8 REPLIES 8
PGStats
Opal | Level 21

Give examples of XSD patterns.

PG
DonH
Lapis Lazuli | Level 10

Meant to include a representative set. Thanks for the reminder. Here is a subset from just one of the XSD files.


[A-Z\d\._'\-]+@[A-Z\d_'\-]+\.[A-Z\d\._'\-]+
[A-Z]{2}
[A-Z]{4}\d{6}[MH][A-Z]{5}[0-9]{2}
[A-ZÑ ]{1,200}
[A-ZÑ&]{3,4}\d{6}[A-Z0-9]{3}
[A-ZÑ&]{3}\d{6}[A-Z0-9]{3}
[A-ZÑ&]{4}\d{6}[A-Z0-9]{3}
[A-ZÑ0-9]{1,14}
[A-ZÑ\d #\-\.&,_@'()]{1,254}
[A-ZÑ\d \-\.,':/$]{1,3000}
[A-ZÑ\d \-\.,:/]{1,100}
[A-ZÑ\d \-_\.&,'#@]{1,200}
\d{1,14}\.\d{2}
\d{1,2}
\d{4}[0|1]\d{1}
\d{4}\-\d{1,9}
\d{5}

 

Found this site that discusses the differences. I tried the suggestion to prefix the pattern with a ^ and suffix it with a $. But that did not create an expression that prxparse accepted.

Also found a few sites that decode the pattern into a description. From which I could presumably create a valid perl regex expression. But given how many of these I have to create, I would prefer to avoid that approach if at all possible.

PGStats
Opal | Level 21

Seems like all you need to add is a pair of delimiters

 

\d{5}   -->  /\d{5}/  will match a substring of 5 digits

 

or 

 

\d{5} --> /^\d{5}$/ will match if the whole string is 5 digits

PG
DonH
Lapis Lazuli | Level 10

Very helpful.  Thx.  So let me first acknowledge that to say I am a novice on regex patterns would give me too much credit.

So, am I correct in assuming that prefixing with /^ and suffixing with $/ will check for an exact match. So for example, 12345x, will fail because it is not an exact match?

PGStats
Opal | Level 21

Right! "12345   " will not match either because of the trailing spaces, but trim("12345   ") will match. 

PG
DonH
Lapis Lazuli | Level 10

Thanks. I had already thought about that and was doing a strip of the string.

ChrisNZ
Tourmaline | Level 20

Note that if you want to match accented letters, the pattern

[A-ZÑ0-9]{1,14}

 

can be extended using a posix character class

[[:upper:]0-9]{1,14}

 

if you need to catch other accented letters.

The letters matched depend on the encoding. For example wlatin1 matches most Western Europe accents like Ñ (Spanish) or Ø (Swedish).

 

Taken from http://www.amazon.com/High-Performance-SAS-Coding-Christian-Graffeuille/dp/1512397490 

DonH
Lapis Lazuli | Level 10

Thanks. For now this project is using UTF-8 encoding and we only need to support Spanish. But this is a good tip.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 991 views
  • 0 likes
  • 3 in conversation