Hello:
Could someone explain what are (?<!\d) and d{4}(?!\d) meaning in the following codes? Thanks.
http://analytics.ncsu.edu/sesug/2012/CT-03.pdf
data SSN ;
input SSN $20. ;
datalines ;
123-54-2280
#987-65-4321
S.S. 666-77-8888
246801357
soc # 133-77-2000
ssnum 888_22-7779
919-555-4689
call me 1800123456
;
run ;
proc sql feedback ;
select ssn
, prxchange( ‘s/(?<!\d)\d{3}[-_]?\d{2}[-_]?\d{4}(?!\d)/xxxxxxxxx/io’, -1, ssn)
as ssn2
from ssn
;
quit ;
Here's how you can find out.
Visit https://regex101.com.
Paste in the main expression:
(?<!\d)\d{3}[-_]?\d{2}[-_]?\d{4}(?!\d)
See the Explanation field.
Do you have any of the SAS documentation available?
From the link I provide below, it states that
The negative look-behind (i.e. (?<!\d) ) and negative look-ahead assertions (i.e. (?!\d) ) are non-capturing.
However, I still don't get what the code means.
Here's how you can find out.
Visit https://regex101.com.
Paste in the main expression:
(?<!\d)\d{3}[-_]?\d{2}[-_]?\d{4}(?!\d)
See the Explanation field.
That is awesome tool! Thank you so much, Chris.
(?<!\d) --negative look behind
(?!\d) --negative lookahead
This values are used to check whether a particular characters are present along with other characters and hence they are called as zero value assertions.
?< symbol indicates look behind, which is to check wether a particular value is there in front of the value you are looking for
! symbol represents not equal to AKA negative
Let us use easier example. if you values goathair, cowhair, cowsomething and you want to find value hair
and also do not want to have goat infront of hair then you something like this (?<!goat)hair
(?<!\d) indicates that you donot want a digit(\d) in the front of your expression given
?! symbol indicates lookahead, which is to check wether a particular value is there in back of the value you are looking for
! symbol represents not equal to AKA negative
Let us take an easier example. if you want to find value of hairyboy goodboy hairykid and you want to find variable with hair
and also do not want to pick hairyboy but the value should have hairy infront then you do something like this hairy(?!boy)
\d{4}(?!\d) --- \d{4} means a digit repeated 4 times after value and there is no digit after that.
I tried my best to explain, please let me know if something is unclear, confusing or wrong
Thank you very much for your detail explaination, Kiranv_
@kiranv_ can you explain this?
been*? is matching to "bee". However, been+? is matching to "been".
I know *? matches previous elenment zero or more times. I thought been*? should match to "been".
Thanks!
? has difference in meaning when it is infront of + and *
+ and * are greedy. It can match condition as long as there is possibility
say the word is "beennnnnnnnnn" , if you say been+ (+ is 1 or more) then it can match "beennnnnnnnnn"
? infront of + and * makes it non greedy.
+? and *? are non greedy. Non greedy means the search stops one it fulfills the least condition. for +? it is one and for *? it is zero)
say the word is "beennnnnnnnnn" and if you instruct regex engine to find been+? then it will pick ups "been" (n is there only once . non greedy here means once it picks first n it stops)
say the word is is "beennnnnnnnnn"if you instruct regex engine to find been*? it picks up "bee"(n can be zero and it stops immediately as non greedy. non greedy here means it can stop without seeing it)
@kiranv_ this makes sense. Thanks !
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.