BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Filipvdr
Pyrite | Level 9

Hello all,

I got a question, my dataset looks like this:

NameAddressGenderTickets
JohnRoad StreetMA102;A102;A103;A104;A105;A105
JeffHigh StreetMA221;A222;A223;
KarlaLow StreetFA202;A104;A021;A124;

I want to count the distinct ticketcodes for each line so my result dataset will have an extra column saying John has 4 different tickets, Jeff 3 and Karla 4. The number of tickets can be unlimited.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

...or good old RegEx:

data have;

  a ="D02;D03;D04;D05;D06;D07;D08;D09";

  output;

  a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";

  output;

  a ="D02";

  output;

  a ="";

  output;

run;

data want;

  set have;

  count=countc(prxchange('s/(\w+\b)(;?\1)*/1/oi',-1,a),'1');

run;

View solution in original post

13 REPLIES 13
Patrick
Opal | Level 21

See below:

data sample;

  length tickets $200;

  tickets='A102;A102;A103;A104;A105;A105';

  tickets_ctn=countw(tickets,';','ot');

  output;

  tickets='A102';

  tickets_ctn=countw(tickets,';','ot');

  output;

  tickets='';

  tickets_ctn=countw(tickets,';','ot');

  output;

  stop;

run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Assuming the delimiter is always the same you can do it simply by counting the delimiter:

result=countc(tickets,";");

However I note that your first row is incorrect in the pattern.  If that really is your data, then maybe a count with scan would be better:

data want;

     set have;

     i=1;

     result=0;

     do while (scan(tickets,i,";") ne "");

          result=result+1;

     end;

run;

Filipvdr
Pyrite | Level 9

Patrick,RW9 i need to count the different strings. In my first example there was a ticketcode twice, but i only want to count it once.

Filipvdr
Pyrite | Level 9

RW9, i modified your code and now it works, thanks:

data test;

a ="D02;D03;D04;D05;D06;D07;D08;D09";

output;

a ="D02;D02;D04;D04;D06;D06;D08;D08";

output;

run;

data want;

     set test;

     i=1;

     result=0;

     do while (scan(a,i,";") ne "");

   last_scan = scan(a,i-1,";");

   scan = scan(a,i,";");

   if last_scan ne scan then result=result+1;

   else result = result;

   i = i+1;

     end;

run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Thats good, but it will only work if the duplicate is in the record before it.  If thats your data great, otherwise maybe look at normalising.

Patrick
Opal | Level 21

...or good old RegEx:

data have;

  a ="D02;D03;D04;D05;D06;D07;D08;D09";

  output;

  a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";

  output;

  a ="D02";

  output;

  a ="";

  output;

run;

data want;

  set have;

  count=countc(prxchange('s/(\w+\b)(;?\1)*/1/oi',-1,a),'1');

run;

Ksharp
Super User

Patrick,

What if

a ="D03;D02;D03;D04;D05;D06;D07;D08;D09";


Code: Program

data have;
  a ="D03;D02;D03;D04;D05;D06;D07;D08;D09";
  output;
  a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";
  output;
  a ="D02";
  output;
  a ="";
  output;
run;

data want;
  set have;
  array x{99999} $ 40 _temporary_;
  count=0;call missing(of x{*});
  do i=1 to countw(a,';');
   temp=scan(a,i,';');
   if temp not in x then do;count+1;x{count}=temp;end;
  end;
  drop i temp;
run;
Patrick
Opal | Level 21

The OP said it's in sequence.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, ok.  Normalise your data then, it will make your life easier. 

data inter (drop=tickets);

     set have;

     length ticket $20;

     i=1;

     do while (scan(tickets,i,";") ne "");

          ticket=scan(tickets,i,";");

          output;

          i=i+1;

     end;

run;

/* Now you have data in a way that you can do aggregates on it */

proc sql;

     create table COUNTS as

     select     NAME,

                     ADDRESS,

                     GENDER,

                     count(distinct TICKET) as CNT

     from         INTER

     group by NAME,

                       ADDRESS,

                       GENDER;

quit;

Filipvdr
Pyrite | Level 9

Yes its ordered sequentially. Thanks again

Filipvdr
Pyrite | Level 9

i don't know RegEx but it works good! Smiley Happy

Patrick
Opal | Level 21

Can be a bit hard to get into it but very worth learning SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

Search the Internet for Perl Regular Expressions. The syntax is not a SAS thing and there are tutorials around.

BrunoSilva
Quartz | Level 8

Hello,

There is so many ways to acomplish the desired output, here another example

data _null_;

      length list $ 200;

      tickets="A202;A104;A021;A124;";

      count=0;

      list="";

      do j=1 to count(tickets,";")+1;

            elem=strip(scan(tickets,j,";"));

            if findw(list,elem,' ', 'R') eq 0 and elem ne "" then do;

                  list=catx(" ",list,elem);

                  count=count+1;

            end;

      end;

      put count=;

run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 2049 views
  • 3 likes
  • 5 in conversation