BookmarkSubscribeRSS Feed
SASvtaUser
Calcite | Level 5

Hi all, I'm new to SAS and would like to extract entity out from text. The predefined concept is inaccurate so I'm trying to use custom regex rules.

The sample text is like: 

=====================

Email header
Name
Jon Doe
Designation
Super Spy

 

Email header2
Full name 
Sam Smith Junior
Designation
Super Spy2

=====================

 

So I defined a rule: REGEX:\n(?:Name|Full name)\s?\n[^\n]+\n

I intent to match based on:

1) Start with new line character \n

2) non-capturing group (?:Name|Full name). [However I noticed that "Name" or "Full name" are still been captured regardless]

3) Optional space character after Name or Full name

4) match a new line character \n

5) capture the full name using [^\n]+

6) End with newline character \n

This REGEX rule returns 0 match, the issue seems to be from the newline character match. 

Kindly advice please!

 

4 REPLIES 4
SASvtaUser
Calcite | Level 5

Allow me to add some more context please! The reason I'm trying Regex instead of other means is because my source data is very unstructured, and I'm trying to Use REGEX rule type to capture very specific scenarios. 

Ksharp
Super User

You could use
\x0D\x0A
or
\x0A
to replace newline \n.
Try use $hex. format to check your newline character is 0A or 0D0A.

SASvtaUser
Calcite | Level 5

Could you elaborate on how to check 0A or 0D0A (using $hex.?) in VTA please?

Ksharp
Super User
????

data _null_;
a='aa'||'0D0A'x||'bb';
put a= $hex32.;

a='aa'||' '||'bb';
put a= $hex32.;

run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 280 views
  • 0 likes
  • 2 in conversation