BookmarkSubscribeRSS Feed
mlance
Calcite | Level 5


Hi,

I'm trying to pull out the most popular value from a text string and create a new variable that tells me what it is.

e.g.

'AAAABBBCCD' would give me a new variable with the value of 'A' in this instance.

Can anyone help please?

Thanks in advance.

9 REPLIES 9
jwillis
Quartz | Level 8

Dear Miance,

I do not know how many rows of data you are working with; the length of the source variable; nor whether you want the new variable on the same row as the original variable.  Are the values in the variable limited to the 26 letters in the alphabet?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please clarify the requirements.  There are many string functions, as can be found in the docs: http://support.sas.com/publishing/pubcat/chaps/59343.pdf

However its dependant on your specific scenario.  A simple calculation on the A or B or C etc. is relatively straight-forward, however its probably unlikely you want that.

Amir
PROC Star

Hi,

The following gives "A":

data _null_;

  length want $1;

  have='AAAABBBCCD';

  text=have;

  max=0;

  do until(lengthn(text)=0);

    achar=substr(text,1,1);

    tally=countc(trim(text),achar);

    if tally gt max then

    do;

      want=achar;

      max=tally;

    end;

    text=compress(text,achar);

  end;

  put want=;

run;

Regards,

Amir.

data_null__
Jade | Level 19

I had the pretty much the same idea as you with slightly different functions.  I made a list for ties.

data _null_;
  
input s $char32.;
   ws=compress(s,
' '); *don't count spaces;
  
do while(not missing(ws));
      l=first(ws);
      c=countc(ws,l,
'TI'); *add I to ignore case;
      m=max(m,c);
      length p $32;
      p=ifc(c eq m,catt(p,l),p);
      ws=compress(ws,l,
'I'); *add I to ignore case;
     
end;
  
put 'NOTE: Most popular character(s) :' p 'frequency:' m;
   cards;
AA  aaBBBCCD
CcC  AA  AABBBCCD
lkdjLabnejndkijdidnd
   run;
mlance
Calcite | Level 5

Thanks very much Amir this works how I wanted it to.

data_null__
Jade | Level 19

I'm wondering why since gave you the program that "works how you wanted" that you did not mark the reply correct but merely helpful which leaves your question "Not Answered".

mlance
Calcite | Level 5

Updated.

Ksharp
Super User
data temp;
 set sashelp.class(keep=name);
 do i=1 to length(name);
  char=upcase(char(name,i));
  output;
 end;
 drop i;
run;
proc freq data=temp order=freq noprint;
by name;
tables char /out=temp1(drop=percent) nopercent ;
run;
data want;
 set temp1;
 by name;
 if first.name;
run;

Xia Keshan

mlance
Calcite | Level 5

Thanks all for your help, I have a solution that works now.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 985 views
  • 4 likes
  • 6 in conversation