DATA Step, Macro, Functions and more

count different strings in separated list

Accepted Solution Solved
Reply
Regular Contributor
Posts: 229
Accepted Solution

count different strings in separated list

Hello all,

I got a question, my dataset looks like this:

NameAddressGenderTickets
JohnRoad StreetMA102;A102;A103;A104;A105;A105
JeffHigh StreetMA221;A222;A223;
KarlaLow StreetFA202;A104;A021;A124;

I want to count the distinct ticketcodes for each line so my result dataset will have an extra column saying John has 4 different tickets, Jeff 3 and Karla 4. The number of tickets can be unlimited.


Accepted Solutions
Solution
‎07-14-2015 06:06 AM
Respected Advisor
Posts: 3,894

Re: count different strings in separated list

...or good old RegEx:

data have;

  a ="D02;D03;D04;D05;D06;D07;D08;D09";

  output;

  a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";

  output;

  a ="D02";

  output;

  a ="";

  output;

run;

data want;

  set have;

  count=countc(prxchange('s/(\w+\b)(;?\1)*/1/oi',-1,a),'1');

run;

View solution in original post


All Replies
Respected Advisor
Posts: 3,894

Re: count different strings in separated list

See below:

data sample;

  length tickets $200;

  tickets='A102;A102;A103;A104;A105;A105';

  tickets_ctn=countw(tickets,';','ot');

  output;

  tickets='A102';

  tickets_ctn=countw(tickets,';','ot');

  output;

  tickets='';

  tickets_ctn=countw(tickets,';','ot');

  output;

  stop;

run;

Super User
Super User
Posts: 7,405

Re: count different strings in separated list

Assuming the delimiter is always the same you can do it simply by counting the delimiter:

result=countc(tickets,";");

However I note that your first row is incorrect in the pattern.  If that really is your data, then maybe a count with scan would be better:

data want;

     set have;

     i=1;

     result=0;

     do while (scan(tickets,i,";") ne "");

          result=result+1;

     end;

run;

Regular Contributor
Posts: 229

Re: count different strings in separated list

Patrick,RW9 i need to count the different strings. In my first example there was a ticketcode twice, but i only want to count it once.

Regular Contributor
Posts: 229

Re: count different strings in separated list

RW9, i modified your code and now it works, thanks:

data test;

a ="D02;D03;D04;D05;D06;D07;D08;D09";

output;

a ="D02;D02;D04;D04;D06;D06;D08;D08";

output;

run;

data want;

     set test;

     i=1;

     result=0;

     do while (scan(a,i,";") ne "");

   last_scan = scan(a,i-1,";");

   scan = scan(a,i,";");

   if last_scan ne scan then result=result+1;

   else result = result;

   i = i+1;

     end;

run;

Super User
Super User
Posts: 7,405

Re: count different strings in separated list

Thats good, but it will only work if the duplicate is in the record before it.  If thats your data great, otherwise maybe look at normalising.

Solution
‎07-14-2015 06:06 AM
Respected Advisor
Posts: 3,894

Re: count different strings in separated list

...or good old RegEx:

data have;

  a ="D02;D03;D04;D05;D06;D07;D08;D09";

  output;

  a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";

  output;

  a ="D02";

  output;

  a ="";

  output;

run;

data want;

  set have;

  count=countc(prxchange('s/(\w+\b)(;?\1)*/1/oi',-1,a),'1');

run;

Super User
Posts: 9,682

Re: count different strings in separated list

Patrick,

What if

a ="D03;D02;D03;D04;D05;D06;D07;D08;D09";


Code: Program

data have;
  a ="D03;D02;D03;D04;D05;D06;D07;D08;D09";
  output;
  a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";
  output;
  a ="D02";
  output;
  a ="";
  output;
run;

data want;
  set have;
  array x{99999} $ 40 _temporary_;
  count=0;call missing(of x{*});
  do i=1 to countw(a,';');
   temp=scan(a,i,';');
   if temp not in x then do;count+1;x{count}=temp;end;
  end;
  drop i temp;
run;
Respected Advisor
Posts: 3,894

Re: count different strings in separated list

The OP said it's in sequence.

Super User
Super User
Posts: 7,405

Re: count different strings in separated list

Ah, ok.  Normalise your data then, it will make your life easier. 

data inter (drop=tickets);

     set have;

     length ticket $20;

     i=1;

     do while (scan(tickets,i,";") ne "");

          ticket=scan(tickets,i,";");

          output;

          i=i+1;

     end;

run;

/* Now you have data in a way that you can do aggregates on it */

proc sql;

     create table COUNTS as

     select     NAME,

                     ADDRESS,

                     GENDER,

                     count(distinct TICKET) as CNT

     from         INTER

     group by NAME,

                       ADDRESS,

                       GENDER;

quit;

Regular Contributor
Posts: 229

Re: count different strings in separated list

Yes its ordered sequentially. Thanks again

Regular Contributor
Posts: 229

Re: count different strings in separated list

i don't know RegEx but it works good! Smiley Happy

Respected Advisor
Posts: 3,894

Re: count different strings in separated list

Can be a bit hard to get into it but very worth learning SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

Search the Internet for Perl Regular Expressions. The syntax is not a SAS thing and there are tutorials around.

Contributor
Posts: 50

Re: count different strings in separated list

Hello,

There is so many ways to acomplish the desired output, here another example

data _null_;

      length list $ 200;

      tickets="A202;A104;A021;A124;";

      count=0;

      list="";

      do j=1 to count(tickets,";")+1;

            elem=strip(scan(tickets,j,";"));

            if findw(list,elem,' ', 'R') eq 0 and elem ne "" then do;

                  list=catx(" ",list,elem);

                  count=count+1;

            end;

      end;

      put count=;

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 359 views
  • 3 likes
  • 5 in conversation