How can I find the most occuring occuring consecutive character in Email. I have a table datasets with one Email field. I want to create another columns next to the Email field showing (1) which character occuring consecutive, and (2), count how many times it occur.
Thanks
Brute force. I don't know if you could use perl expression for this, but that's an area I know very little about.
Key things to notice:
CHAR() function retrieves the i'th letter from a string
data have;
email='happygggg@hotmail.com';output;
email='HanDan2nnnng@gmail.com'; output;
run;
data want;
set have;
length max_char $1.;
max_seq=0; max_char=' ';
do i=2 to length(email);
if char(email, i-1)=char(email, i) then do;
max_seq+1;
max_char=char(email, i);
end;
end;
max_seq_letter = repeat(max_char, max_seq);
drop i;
run;
Sample data, input and output.
Hi, for example: Headers are fields name: Character Repeats and Count of Repeat are what I want to get based on the email provided.
Example Output based on variable Email.
Email | Character Repeats | Counts of Repeat
happyggg@hotmail.com | ggg | 3
HanDan2nnnng@gmail.com | nnnn | 4
Thanks, hope it helps.
I suggest you check out the COUNTC function:
You could use this function for every alphabet character you wish to search for.
Thank you
Brute force. I don't know if you could use perl expression for this, but that's an area I know very little about.
Key things to notice:
CHAR() function retrieves the i'th letter from a string
data have;
email='happygggg@hotmail.com';output;
email='HanDan2nnnng@gmail.com'; output;
run;
data want;
set have;
length max_char $1.;
max_seq=0; max_char=' ';
do i=2 to length(email);
if char(email, i-1)=char(email, i) then do;
max_seq+1;
max_char=char(email, i);
end;
end;
max_seq_letter = repeat(max_char, max_seq);
drop i;
run;
Thank you! This is what i needed. But for the two example emails, I want to use emails from another dataset, let's say Work.query_for_query_EMAIL and those codes below will run through all the rows and give me the results I needed.
Thank you. I figured out the last question I had.
If more than one sequence of repeated characters can occur in the same string (as in happygggg), it is useful to work with two separate counters (LEN vs. MAX_SEQ in the code below, using Reeza's variable names):
data want;
set have;
length max_seq_letter $20
max_char $1;
do i=2 to length(email);
len=1;
do while(char(email, i)=char(email, i-1));
len+1;
i+1;
end;
if len>max(max_seq, 1) then do;
max_seq=len;
max_char=char(email, i-1);
end;
end;
if max_seq then max_seq_letter=repeat(max_char, max_seq-1);
output;
label max_seq='Counts of Repeat'
max_seq_letter='Character Repeats';
drop i len max_char;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.