Hello all,
I got a question, my dataset looks like this:
Name | Address | Gender | Tickets |
---|---|---|---|
John | Road Street | M | A102;A102;A103;A104;A105;A105 |
Jeff | High Street | M | A221;A222;A223; |
Karla | Low Street | F | A202;A104;A021;A124; |
I want to count the distinct ticketcodes for each line so my result dataset will have an extra column saying John has 4 different tickets, Jeff 3 and Karla 4. The number of tickets can be unlimited.
...or good old RegEx:
data have;
a ="D02;D03;D04;D05;D06;D07;D08;D09";
output;
a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";
output;
a ="D02";
output;
a ="";
output;
run;
data want;
set have;
count=countc(prxchange('s/(\w+\b)(;?\1)*/1/oi',-1,a),'1');
run;
See below:
data sample;
length tickets $200;
tickets='A102;A102;A103;A104;A105;A105';
tickets_ctn=countw(tickets,';','ot');
output;
tickets='A102';
tickets_ctn=countw(tickets,';','ot');
output;
tickets='';
tickets_ctn=countw(tickets,';','ot');
output;
stop;
run;
Assuming the delimiter is always the same you can do it simply by counting the delimiter:
result=countc(tickets,";");
However I note that your first row is incorrect in the pattern. If that really is your data, then maybe a count with scan would be better:
data want;
set have;
i=1;
result=0;
do while (scan(tickets,i,";") ne "");
result=result+1;
end;
run;
Patrick,RW9 i need to count the different strings. In my first example there was a ticketcode twice, but i only want to count it once.
RW9, i modified your code and now it works, thanks:
data test;
a ="D02;D03;D04;D05;D06;D07;D08;D09";
output;
a ="D02;D02;D04;D04;D06;D06;D08;D08";
output;
run;
data want;
set test;
i=1;
result=0;
do while (scan(a,i,";") ne "");
last_scan = scan(a,i-1,";");
scan = scan(a,i,";");
if last_scan ne scan then result=result+1;
else result = result;
i = i+1;
end;
run;
Thats good, but it will only work if the duplicate is in the record before it. If thats your data great, otherwise maybe look at normalising.
...or good old RegEx:
data have;
a ="D02;D03;D04;D05;D06;D07;D08;D09";
output;
a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";
output;
a ="D02";
output;
a ="";
output;
run;
data want;
set have;
count=countc(prxchange('s/(\w+\b)(;?\1)*/1/oi',-1,a),'1');
run;
Patrick,
What if
a ="D03;D02;D03;D04;D05;D06;D07;D08;D09";
data have;
a ="D03;D02;D03;D04;D05;D06;D07;D08;D09";
output;
a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";
output;
a ="D02";
output;
a ="";
output;
run;
data want;
set have;
array x{99999} $ 40 _temporary_;
count=0;call missing(of x{*});
do i=1 to countw(a,';');
temp=scan(a,i,';');
if temp not in x then do;count+1;x{count}=temp;end;
end;
drop i temp;
run;
The OP said it's in sequence.
Ah, ok. Normalise your data then, it will make your life easier.
data inter (drop=tickets);
set have;
length ticket $20;
i=1;
do while (scan(tickets,i,";") ne "");
ticket=scan(tickets,i,";");
output;
i=i+1;
end;
run;
/* Now you have data in a way that you can do aggregates on it */
proc sql;
create table COUNTS as
select NAME,
ADDRESS,
GENDER,
count(distinct TICKET) as CNT
from INTER
group by NAME,
ADDRESS,
GENDER;
quit;
Yes its ordered sequentially. Thanks again
i don't know RegEx but it works good!
Can be a bit hard to get into it but very worth learning SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition
Search the Internet for Perl Regular Expressions. The syntax is not a SAS thing and there are tutorials around.
Hello,
There is so many ways to acomplish the desired output, here another example
|
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.