Hello all,
I got a question, my dataset looks like this:
Name | Address | Gender | Tickets |
---|---|---|---|
John | Road Street | M | A102;A102;A103;A104;A105;A105 |
Jeff | High Street | M | A221;A222;A223; |
Karla | Low Street | F | A202;A104;A021;A124; |
I want to count the distinct ticketcodes for each line so my result dataset will have an extra column saying John has 4 different tickets, Jeff 3 and Karla 4. The number of tickets can be unlimited.
...or good old RegEx:
data have;
a ="D02;D03;D04;D05;D06;D07;D08;D09";
output;
a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";
output;
a ="D02";
output;
a ="";
output;
run;
data want;
set have;
count=countc(prxchange('s/(\w+\b)(;?\1)*/1/oi',-1,a),'1');
run;
See below:
data sample;
length tickets $200;
tickets='A102;A102;A103;A104;A105;A105';
tickets_ctn=countw(tickets,';','ot');
output;
tickets='A102';
tickets_ctn=countw(tickets,';','ot');
output;
tickets='';
tickets_ctn=countw(tickets,';','ot');
output;
stop;
run;
Assuming the delimiter is always the same you can do it simply by counting the delimiter:
result=countc(tickets,";");
However I note that your first row is incorrect in the pattern. If that really is your data, then maybe a count with scan would be better:
data want;
set have;
i=1;
result=0;
do while (scan(tickets,i,";") ne "");
result=result+1;
end;
run;
Patrick,RW9 i need to count the different strings. In my first example there was a ticketcode twice, but i only want to count it once.
RW9, i modified your code and now it works, thanks:
data test;
a ="D02;D03;D04;D05;D06;D07;D08;D09";
output;
a ="D02;D02;D04;D04;D06;D06;D08;D08";
output;
run;
data want;
set test;
i=1;
result=0;
do while (scan(a,i,";") ne "");
last_scan = scan(a,i-1,";");
scan = scan(a,i,";");
if last_scan ne scan then result=result+1;
else result = result;
i = i+1;
end;
run;
Thats good, but it will only work if the duplicate is in the record before it. If thats your data great, otherwise maybe look at normalising.
...or good old RegEx:
data have;
a ="D02;D03;D04;D05;D06;D07;D08;D09";
output;
a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";
output;
a ="D02";
output;
a ="";
output;
run;
data want;
set have;
count=countc(prxchange('s/(\w+\b)(;?\1)*/1/oi',-1,a),'1');
run;
Patrick,
What if
a ="D03;D02;D03;D04;D05;D06;D07;D08;D09";
data have;
a ="D03;D02;D03;D04;D05;D06;D07;D08;D09";
output;
a ="D02;D02;D04;D04;D06;D06;D06;D08;D08";
output;
a ="D02";
output;
a ="";
output;
run;
data want;
set have;
array x{99999} $ 40 _temporary_;
count=0;call missing(of x{*});
do i=1 to countw(a,';');
temp=scan(a,i,';');
if temp not in x then do;count+1;x{count}=temp;end;
end;
drop i temp;
run;
The OP said it's in sequence.
Ah, ok. Normalise your data then, it will make your life easier.
data inter (drop=tickets);
set have;
length ticket $20;
i=1;
do while (scan(tickets,i,";") ne "");
ticket=scan(tickets,i,";");
output;
i=i+1;
end;
run;
/* Now you have data in a way that you can do aggregates on it */
proc sql;
create table COUNTS as
select NAME,
ADDRESS,
GENDER,
count(distinct TICKET) as CNT
from INTER
group by NAME,
ADDRESS,
GENDER;
quit;
Yes its ordered sequentially. Thanks again
i don't know RegEx but it works good!
Can be a bit hard to get into it but very worth learning SAS(R) 9.4 Functions and CALL Routines: Reference, Third Edition
Search the Internet for Perl Regular Expressions. The syntax is not a SAS thing and there are tutorials around.
Hello,
There is so many ways to acomplish the desired output, here another example
|
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.