Hi all, I'm new to SAS and would like to extract entity out from text. The predefined concept is inaccurate so I'm trying to use custom regex rules.
The sample text is like:
=====================
Email header
Name
Jon Doe
Designation
Super Spy
Email header2
Full name
Sam Smith Junior
Designation
Super Spy2
=====================
So I defined a rule: REGEX:\n(?:Name|Full name)\s?\n[^\n]+\n
I intent to match based on:
1) Start with new line character \n
2) non-capturing group (?:Name|Full name). [However I noticed that "Name" or "Full name" are still been captured regardless]
3) Optional space character after Name or Full name
4) match a new line character \n
5) capture the full name using [^\n]+
6) End with newline character \n
This REGEX rule returns 0 match, the issue seems to be from the newline character match.
Kindly advice please!
Allow me to add some more context please! The reason I'm trying Regex instead of other means is because my source data is very unstructured, and I'm trying to Use REGEX rule type to capture very specific scenarios.
You could use
\x0D\x0A
or
\x0A
to replace newline \n.
Try use $hex. format to check your newline character is 0A or 0D0A.
Could you elaborate on how to check 0A or 0D0A (using $hex.?) in VTA please?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.