As a first step towards data cleaning:
How to get the counts of following where there just hypens before @ in email addresses:
------------------------------------@GMAIL.COM
You will probably need some Regular expression:
data have;
input email :$char100.;
cards;
------------------------------------@GMAIL.COM
-------------@-------------.COM
---------@--.COM
;
data want;
set have;
if prxmatch("/^-+@/",email) then ind=1;
run;
proc print;run;
Haikuo
One way:
proc freq data=have;
tables email;
where index(email, '----');
run;
It will print a table based on any email addresses that contain four dashes in a row, anywhere within the address. If you want to cut it to less than 4, that's up to you. You know your data best.
Good luck.
I had to ask, what if you want to keep:
but to drop:
I still think the Regular expression is the way to go, at least for the first step.
my 2 cents,
Haikuo
: Since you wanted strings that start with any number of hyphens, then followed with an @, I'll suggest adding one more character to Haikuo's code. The ^ at the beginning forces the match to have occurred at the start of the string:
data have;
informat email $50.;
input email;
cards;
------------------------------------@GMAIL.COM
-------------@-------------.COM
---------@--.COM
;
data want;
set have;;
if prxmatch("/^-+@/",email) then ind=1;
run;
DATA WANT;
SET HAVE;
if index(email,'@')-1 = countc(substr(email,1,index(email,'@')-1),'-');
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.