Solved: Re: count different strings in separated list

Filipvdr · Posted 07-14-2015 04:48 AM

Hello all,

I got a question, my dataset looks like this:

Name	Address	Gender	Tickets
John	Road Street	M	A102;A102;A103;A104;A105;A105
Jeff	High Street	M	A221;A222;A223;
Karla	Low Street	F	A202;A104;A021;A124;

I want to count the distinct ticketcodes for each line so my result dataset will have an extra column saying John has 4 different tickets, Jeff 3 and Karla 4. The number of tickets can be unlimited.

Patrick · Posted 07-14-2015 06:06 AM

...or good old RegEx:

data have;

a ="D02;D03;D04;D05;D06;D07;D08;D09";

output;

a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";

output;

a ="D02";

output;

a ="";

output;

run;

data want;

set have;

count=countc(prxchange('s/(\w+\b)(;?\1)*/1/oi',-1,a),'1');

run;

View solution in original post

Patrick · Posted 07-14-2015 04:57 AM

See below:

data sample;

length tickets $200;

tickets='A102;A102;A103;A104;A105;A105';

tickets_ctn=countw(tickets,';','ot');

output;

tickets='A102';

tickets_ctn=countw(tickets,';','ot');

output;

tickets='';

tickets_ctn=countw(tickets,';','ot');

output;

stop;

run;

RW9 · Posted 07-14-2015 04:59 AM

Assuming the delimiter is always the same you can do it simply by counting the delimiter:

result=countc(tickets,";");

However I note that your first row is incorrect in the pattern. If that really is your data, then maybe a count with scan would be better:

data want;

set have;

i=1;

result=0;

do while (scan(tickets,i,";") ne "");

result=result+1;

end;

run;

Filipvdr · Posted 07-14-2015 05:12 AM

Patrick,RW9 i need to count the different strings. In my first example there was a ticketcode twice, but i only want to count it once.

Filipvdr · Posted 07-14-2015 05:20 AM

RW9, i modified your code and now it works, thanks:

data test;

a ="D02;D03;D04;D05;D06;D07;D08;D09";

output;

a ="D02;D02;D04;D04;D06;D06;D08;D08";

output;

run;

data want;

set test;

i=1;

result=0;

do while (scan(a,i,";") ne "");

last_scan = scan(a,i-1,";");

scan = scan(a,i,";");

if last_scan ne scan then result=result+1;

else result = result;

i = i+1;

end;

run;

RW9 · Posted 07-14-2015 05:23 AM

Thats good, but it will only work if the duplicate is in the record before it. If thats your data great, otherwise maybe look at normalising.

Patrick · Posted 07-14-2015 06:06 AM

...or good old RegEx:

data have;

a ="D02;D03;D04;D05;D06;D07;D08;D09";

output;

a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";

output;

a ="D02";

output;

a ="";

output;

run;

data want;

set have;

count=countc(prxchange('s/(\w+\b)(;?\1)*/1/oi',-1,a),'1');

run;

Ksharp · Posted 07-14-2015 08:32 AM

Patrick,

What if

a ="D03;D02;D03;D04;D05;D06;D07;D08;D09";

Code: Program

data have;
  a ="D03;D02;D03;D04;D05;D06;D07;D08;D09";
  output;
  a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";
  output;
  a ="D02";
  output;
  a ="";
  output;
run;
 
data want;
  set have;
  array x{99999} $ 40 _temporary_;
  count=0;call missing(of x{*});
  do i=1 to countw(a,';');
   temp=scan(a,i,';');
   if temp not in x then do;count+1;x{count}=temp;end;
  end;
  drop i temp;
run;

Patrick · Posted 07-14-2015 08:37 AM

The OP said it's in sequence.

RW9 · Posted 07-14-2015 05:22 AM

Ah, ok. Normalise your data then, it will make your life easier.

data inter (drop=tickets);

set have;

length ticket $20;

i=1;

do while (scan(tickets,i,";") ne "");

ticket=scan(tickets,i,";");

output;

i=i+1;

end;

run;

/* Now you have data in a way that you can do aggregates on it */

proc sql;

create table COUNTS as

select NAME,

ADDRESS,

GENDER,

count(distinct TICKET) as CNT

from INTER

group by NAME,

ADDRESS,

GENDER;

quit;

Filipvdr · Posted 07-14-2015 05:26 AM

Yes its ordered sequentially. Thanks again

Filipvdr · Posted 07-14-2015 06:13 AM

i don't know RegEx but it works good!

Patrick · Posted 07-14-2015 06:19 AM

Can be a bit hard to get into it but very worth learning SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

Search the Internet for Perl Regular Expressions. The syntax is not a SAS thing and there are tutorials around.

BrunoSilva · Posted 07-14-2015 06:23 AM

Hello,

There is so many ways to acomplish the desired output, here another example

data _null_;
 
      length list $ 200;
      tickets="A202;A104;A021;A124;";
 
      count=0;
      list="";
      do j=1 to count(tickets,";")+1;
            elem=strip(scan(tickets,j,";"));
 
            if findw(list,elem,' ', 'R') eq 0 and elem ne "" then do;
                  list=catx(" ",list,elem);
                  count=count+1;
            end;
      end;
      put count=;
 
run;

Code: Program

Classroom Training Available!