BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello:

 

Could someone explain what are (?<!\d) and  d{4}(?!\d) meaning in the following codes?  Thanks.

 

http://analytics.ncsu.edu/sesug/2012/CT-03.pdf

 

 

data SSN ;

input SSN $20. ;

datalines ;

123-54-2280

#987-65-4321

S.S. 666-77-8888

246801357

soc # 133-77-2000

ssnum 888_22-7779

919-555-4689

call me 1800123456

;

run ;

proc sql feedback ;

select ssn

, prxchange( ‘s/(?<!\d)\d{3}[-_]?\d{2}[-_]?\d{4}(?!\d)/xxxxxxxxx/io’, -1, ssn)

as ssn2

from ssn

;

quit ;

 

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisHemedinger
Community Manager

Here's how you can find out.  

 

Visit https://regex101.com.

 

Paste in the main expression:

 

 

(?<!\d)\d{3}[-_]?\d{2}[-_]?\d{4}(?!\d)

 

See the Explanation field.

 

regexexp.png

 

 

 

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.

View solution in original post

9 REPLIES 9
ballardw
Super User

Do you have any of the SAS documentation available?

ybz12003
Rhodochrosite | Level 12

From the link I provide below, it states that

 

The negative look-behind (i.e. (?<!\d) ) and negative look-ahead assertions (i.e. (?!\d) ) are non-capturing.

 

However, I still don't get what the code means.

ChrisHemedinger
Community Manager

Here's how you can find out.  

 

Visit https://regex101.com.

 

Paste in the main expression:

 

 

(?<!\d)\d{3}[-_]?\d{2}[-_]?\d{4}(?!\d)

 

See the Explanation field.

 

regexexp.png

 

 

 

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
ybz12003
Rhodochrosite | Level 12

That is awesome tool! Thank you so much, Chris.

kiranv_
Rhodochrosite | Level 12

(?<!\d) --negative look behind

(?!\d) --negative lookahead

This values are used to check whether a particular characters are present along with other  characters and hence they are called as zero value assertions.

 

?< symbol indicates look behind, which is to check wether a particular value is there  in front of the value you are looking for

! symbol represents not equal to AKA negative

Let us use easier example. if you values goathair, cowhair, cowsomething and you want to find value hair

and also do not want to have goat infront of hair then you something like this (?<!goat)hair

 

(?<!\d) indicates that you donot want a digit(\d) in the front of your expression given

 

 

?! symbol indicates lookahead, which is to check wether a particular value is there  in back of the value you are looking for

! symbol represents not equal to AKA negative

Let us take an easier example. if you want to find value of  hairyboy goodboy hairykid and you want to find variable with hair

and also do not want to pick hairyboy but the value should have hairy infront then  you do something like this hairy(?!boy)

 

 

\d{4}(?!\d)  --- \d{4} means a digit repeated 4 times after value and there is no digit after that.

 

I tried my best to explain, please let me know if something is unclear, confusing or wrong

ybz12003
Rhodochrosite | Level 12

Thank you very much for your detail explaination, Kiranv_

SAS_inquisitive
Lapis Lazuli | Level 10

@kiranv_ can you explain this?

 

been*? is matching to "bee". However, been+? is matching to "been".

 

I know *? matches previous elenment zero or more times. I thought been*? should match to "been".

 

Thanks!

kiranv_
Rhodochrosite | Level 12

?  has difference in meaning when it is infront of + and *

+ and * are greedy. It can match condition as long as there is possibility

 

say the word is "beennnnnnnnnn" ,  if you say been+ (+ is 1 or more) then it can  match "beennnnnnnnnn"

? infront of + and * makes it non greedy.

 

+? and *? are non greedy.  Non greedy means the search stops one it fulfills the least condition. for +? it is one  and for *? it is zero)

 

say the word is "beennnnnnnnnn" and if you instruct regex engine to find  been+?  then it will pick ups "been" (n is there only once . non greedy here means once it picks first n it stops)

 

say the word is is "beennnnnnnnnn"if you instruct regex engine to find been*? it picks up "bee"(n can be zero and it stops immediately as non greedy. non greedy here means it can stop without seeing it)

 

 

 

SAS_inquisitive
Lapis Lazuli | Level 10

@kiranv_ this makes sense. Thanks !

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1238 views
  • 4 likes
  • 5 in conversation