As a first step towards data cleaning:
How to get the counts of following where there just hypens before @ in email addresses:
------------------------------------@GMAIL.COM
You will probably need some Regular expression:
data have;
input email :$char100.;
cards;
------------------------------------@GMAIL.COM
-------------@-------------.COM
---------@--.COM
;
data want;
set have;
if prxmatch("/^-+@/",email) then ind=1;
run;
proc print;run;
Haikuo
One way:
proc freq data=have;
tables email;
where index(email, '----');
run;
It will print a table based on any email addresses that contain four dashes in a row, anywhere within the address. If you want to cut it to less than 4, that's up to you. You know your data best.
Good luck.
I had to ask, what if you want to keep:
but to drop:
I still think the Regular expression is the way to go, at least for the first step.
my 2 cents,
Haikuo
: Since you wanted strings that start with any number of hyphens, then followed with an @, I'll suggest adding one more character to Haikuo's code. The ^ at the beginning forces the match to have occurred at the start of the string:
data have;
informat email $50.;
input email;
cards;
------------------------------------@GMAIL.COM
-------------@-------------.COM
---------@--.COM
;
data want;
set have;;
if prxmatch("/^-+@/",email) then ind=1;
run;
DATA WANT;
SET HAVE;
if index(email,'@')-1 = countc(substr(email,1,index(email,'@')-1),'-');
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.