Desktop productivity for business analysts and programmers

count the number of most occuring consecutive character in Email

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 104
Accepted Solution

count the number of most occuring consecutive character in Email

How can I find the most occuring occuring consecutive character in Email.  I have a table datasets with one Email field.  I want to create another columns next to the Email field showing (1) which character occuring consecutive, and (2), count how many times it occur.

 

Thanks


Accepted Solutions
Solution
‎04-04-2016 03:33 PM
Grand Advisor
Posts: 17,428

Re: count the number of most occuring consecutive character in Email

Brute force. I don't know if you could use perl expression for this, but that's an area I know very little about. 

 

Key things to notice:

CHAR() function retrieves the i'th letter from a string

 

data have;
email='happygggg@hotmail.com';output;
email='HanDan2nnnng@gmail.com'; output;
run;

data want;
set have;
length max_char $1.;
max_seq=0; max_char=' ';
do i=2 to length(email);
  if char(email, i-1)=char(email, i) then do;
	max_seq+1;
	max_char=char(email, i);
  end;

end;

max_seq_letter = repeat(max_char, max_seq);

drop i;

run;

View solution in original post


All Replies
Grand Advisor
Posts: 17,428

Re: count the number of most occuring consecutive character in Email

[ Edited ]

Sample data, input and output.

Frequent Contributor
Posts: 104

Re: count the number of most occuring consecutive character in Email

Hi, for example:  Headers are fields name: Character Repeats and Count of Repeat are what I want to get based on the email provided.

 

Example Output based on variable Email.

 

 Email                           | Character Repeats | Counts of Repeat

happyggg@hotmail.com |        ggg               |  3

HanDan2nnnng@gmail.com |  nnnn            |   4

 

Thanks, hope it helps. 

Respected Advisor
Posts: 3,066

Re: count the number of most occuring consecutive character in Email

I suggest you check out the COUNTC function:

 

http://support.sas.com/documentation/cdl/en/lefunctionsref/67960/HTML/default/viewer.htm#n1qcntq4r6p...

 

You could use this function for every alphabet character you wish to search for.

Frequent Contributor
Posts: 104

Re: count the number of most occuring consecutive character in Email

Thank you

Solution
‎04-04-2016 03:33 PM
Grand Advisor
Posts: 17,428

Re: count the number of most occuring consecutive character in Email

Brute force. I don't know if you could use perl expression for this, but that's an area I know very little about. 

 

Key things to notice:

CHAR() function retrieves the i'th letter from a string

 

data have;
email='happygggg@hotmail.com';output;
email='HanDan2nnnng@gmail.com'; output;
run;

data want;
set have;
length max_char $1.;
max_seq=0; max_char=' ';
do i=2 to length(email);
  if char(email, i-1)=char(email, i) then do;
	max_seq+1;
	max_char=char(email, i);
  end;

end;

max_seq_letter = repeat(max_char, max_seq);

drop i;

run;
Frequent Contributor
Posts: 104

Re: count the number of most occuring consecutive character in Email

Thank you! This is what i needed.  But for the two example emails, I want to use emails from another dataset, let's say Work.query_for_query_EMAIL and those codes below will run through all the rows and give me the results I needed.

 

 

Frequent Contributor
Posts: 104

Re: count the number of most occuring consecutive character in Email

Thank you.  I figured out the last question I had.

 

Trusted Advisor
Posts: 1,114

Re: count the number of most occuring consecutive character in Email

If more than one sequence of repeated characters can occur in the same string (as in happygggg), it is useful to work with two separate counters (LEN vs. MAX_SEQ in the code below, using Reeza's variable names):

data want;
set have;
length max_seq_letter $20
       max_char $1;
do i=2 to length(email);
  len=1;
  do while(char(email, i)=char(email, i-1));
    len+1;
    i+1;
  end;
  if len>max(max_seq, 1) then do;
    max_seq=len;
    max_char=char(email, i-1);
  end;
end;
if max_seq then max_seq_letter=repeat(max_char, max_seq-1);
output;
label max_seq='Counts of Repeat'
      max_seq_letter='Character Repeats';
drop i len max_char;
run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 597 views
  • 1 like
  • 4 in conversation