As a first step towards data cleaning:
How to get the counts of following where there just hypens before @ in email addresses:
------------------------------------@GMAIL.COM
You will probably need some Regular expression:
data have;
input email :$char100.;
cards;
------------------------------------@GMAIL.COM
-------------@-------------.COM
---------@--.COM
;
data want;
set have;
if prxmatch("/^-+@/",email) then ind=1;
run;
proc print;run;
Haikuo
One way:
proc freq data=have;
tables email;
where index(email, '----');
run;
It will print a table based on any email addresses that contain four dashes in a row, anywhere within the address. If you want to cut it to less than 4, that's up to you. You know your data best.
Good luck.
I had to ask, what if you want to keep:
but to drop:
I still think the Regular expression is the way to go, at least for the first step.
my 2 cents,
Haikuo
: Since you wanted strings that start with any number of hyphens, then followed with an @, I'll suggest adding one more character to Haikuo's code. The ^ at the beginning forces the match to have occurred at the start of the string:
data have;
informat email $50.;
input email;
cards;
------------------------------------@GMAIL.COM
-------------@-------------.COM
---------@--.COM
;
data want;
set have;;
if prxmatch("/^-+@/",email) then ind=1;
run;
DATA WANT;
SET HAVE;
if index(email,'@')-1 = countc(substr(email,1,index(email,'@')-1),'-');
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.