BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Filipvdr
Pyrite | Level 9

Hello all,

I got a question, my dataset looks like this:

NameAddressGenderTickets
JohnRoad StreetMA102;A102;A103;A104;A105;A105
JeffHigh StreetMA221;A222;A223;
KarlaLow StreetFA202;A104;A021;A124;

I want to count the distinct ticketcodes for each line so my result dataset will have an extra column saying John has 4 different tickets, Jeff 3 and Karla 4. The number of tickets can be unlimited.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

...or good old RegEx:

data have;

  a ="D02;D03;D04;D05;D06;D07;D08;D09";

  output;

  a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";

  output;

  a ="D02";

  output;

  a ="";

  output;

run;

data want;

  set have;

  count=countc(prxchange('s/(\w+\b)(;?\1)*/1/oi',-1,a),'1');

run;

View solution in original post

13 REPLIES 13
Patrick
Opal | Level 21

See below:

data sample;

  length tickets $200;

  tickets='A102;A102;A103;A104;A105;A105';

  tickets_ctn=countw(tickets,';','ot');

  output;

  tickets='A102';

  tickets_ctn=countw(tickets,';','ot');

  output;

  tickets='';

  tickets_ctn=countw(tickets,';','ot');

  output;

  stop;

run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Assuming the delimiter is always the same you can do it simply by counting the delimiter:

result=countc(tickets,";");

However I note that your first row is incorrect in the pattern.  If that really is your data, then maybe a count with scan would be better:

data want;

     set have;

     i=1;

     result=0;

     do while (scan(tickets,i,";") ne "");

          result=result+1;

     end;

run;

Filipvdr
Pyrite | Level 9

Patrick,RW9 i need to count the different strings. In my first example there was a ticketcode twice, but i only want to count it once.

Filipvdr
Pyrite | Level 9

RW9, i modified your code and now it works, thanks:

data test;

a ="D02;D03;D04;D05;D06;D07;D08;D09";

output;

a ="D02;D02;D04;D04;D06;D06;D08;D08";

output;

run;

data want;

     set test;

     i=1;

     result=0;

     do while (scan(a,i,";") ne "");

   last_scan = scan(a,i-1,";");

   scan = scan(a,i,";");

   if last_scan ne scan then result=result+1;

   else result = result;

   i = i+1;

     end;

run;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Thats good, but it will only work if the duplicate is in the record before it.  If thats your data great, otherwise maybe look at normalising.

Patrick
Opal | Level 21

...or good old RegEx:

data have;

  a ="D02;D03;D04;D05;D06;D07;D08;D09";

  output;

  a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";

  output;

  a ="D02";

  output;

  a ="";

  output;

run;

data want;

  set have;

  count=countc(prxchange('s/(\w+\b)(;?\1)*/1/oi',-1,a),'1');

run;

Ksharp
Super User

Patrick,

What if

a ="D03;D02;D03;D04;D05;D06;D07;D08;D09";


Code: Program

data have;
  a ="D03;D02;D03;D04;D05;D06;D07;D08;D09";
  output;
  a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";
  output;
  a ="D02";
  output;
  a ="";
  output;
run;

data want;
  set have;
  array x{99999} $ 40 _temporary_;
  count=0;call missing(of x{*});
  do i=1 to countw(a,';');
   temp=scan(a,i,';');
   if temp not in x then do;count+1;x{count}=temp;end;
  end;
  drop i temp;
run;
Patrick
Opal | Level 21

The OP said it's in sequence.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, ok.  Normalise your data then, it will make your life easier. 

data inter (drop=tickets);

     set have;

     length ticket $20;

     i=1;

     do while (scan(tickets,i,";") ne "");

          ticket=scan(tickets,i,";");

          output;

          i=i+1;

     end;

run;

/* Now you have data in a way that you can do aggregates on it */

proc sql;

     create table COUNTS as

     select     NAME,

                     ADDRESS,

                     GENDER,

                     count(distinct TICKET) as CNT

     from         INTER

     group by NAME,

                       ADDRESS,

                       GENDER;

quit;

Filipvdr
Pyrite | Level 9

Yes its ordered sequentially. Thanks again

Filipvdr
Pyrite | Level 9

i don't know RegEx but it works good! Smiley Happy

Patrick
Opal | Level 21

Can be a bit hard to get into it but very worth learning SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition

Search the Internet for Perl Regular Expressions. The syntax is not a SAS thing and there are tutorials around.

BrunoSilva
Quartz | Level 8

Hello,

There is so many ways to acomplish the desired output, here another example

data _null_;

      length list $ 200;

      tickets="A202;A104;A021;A124;";

      count=0;

      list="";

      do j=1 to count(tickets,";")+1;

            elem=strip(scan(tickets,j,";"));

            if findw(list,elem,' ', 'R') eq 0 and elem ne "" then do;

                  list=catx(" ",list,elem);

                  count=count+1;

            end;

      end;

      put count=;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1301 views
  • 3 likes
  • 5 in conversation