How can I find the most occuring occuring consecutive character in Email. I have a table datasets with one Email field. I want to create another columns next to the Email field showing (1) which character occuring consecutive, and (2), count how many times it occur.
Thanks
Brute force. I don't know if you could use perl expression for this, but that's an area I know very little about.
Key things to notice:
CHAR() function retrieves the i'th letter from a string
data have;
email='happygggg@hotmail.com';output;
email='HanDan2nnnng@gmail.com'; output;
run;
data want;
set have;
length max_char $1.;
max_seq=0; max_char=' ';
do i=2 to length(email);
if char(email, i-1)=char(email, i) then do;
max_seq+1;
max_char=char(email, i);
end;
end;
max_seq_letter = repeat(max_char, max_seq);
drop i;
run;
Sample data, input and output.
Hi, for example: Headers are fields name: Character Repeats and Count of Repeat are what I want to get based on the email provided.
Example Output based on variable Email.
Email | Character Repeats | Counts of Repeat
happyggg@hotmail.com | ggg | 3
HanDan2nnnng@gmail.com | nnnn | 4
Thanks, hope it helps.
I suggest you check out the COUNTC function:
You could use this function for every alphabet character you wish to search for.
Thank you
Brute force. I don't know if you could use perl expression for this, but that's an area I know very little about.
Key things to notice:
CHAR() function retrieves the i'th letter from a string
data have;
email='happygggg@hotmail.com';output;
email='HanDan2nnnng@gmail.com'; output;
run;
data want;
set have;
length max_char $1.;
max_seq=0; max_char=' ';
do i=2 to length(email);
if char(email, i-1)=char(email, i) then do;
max_seq+1;
max_char=char(email, i);
end;
end;
max_seq_letter = repeat(max_char, max_seq);
drop i;
run;
Thank you! This is what i needed. But for the two example emails, I want to use emails from another dataset, let's say Work.query_for_query_EMAIL and those codes below will run through all the rows and give me the results I needed.
Thank you. I figured out the last question I had.
If more than one sequence of repeated characters can occur in the same string (as in happygggg), it is useful to work with two separate counters (LEN vs. MAX_SEQ in the code below, using Reeza's variable names):
data want;
set have;
length max_seq_letter $20
max_char $1;
do i=2 to length(email);
len=1;
do while(char(email, i)=char(email, i-1));
len+1;
i+1;
end;
if len>max(max_seq, 1) then do;
max_seq=len;
max_char=char(email, i-1);
end;
end;
if max_seq then max_seq_letter=repeat(max_char, max_seq-1);
output;
label max_seq='Counts of Repeat'
max_seq_letter='Character Repeats';
drop i len max_char;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.