Dear @AnnaKM,
Many thanks for your valuable feedback. I’m glad to read that you were able to accomplish your specific counting task, but sad to learn that my code did not work for your real data. I am very interested in the root cause of this issue. Could you please provide a little more information?
What kind of error message did you get, if any?
Did you use the code in a different way than data want; set have; ...
Was there anything special about the 28th observation that you mentioned?
Meanwhile I’ve successfully tested my algorithm on several simulated datasets comprising a total of more than 220 million observations (incl. all possible combinations of 16 values in {., 10, 11}). In each case I compared the results to those obtained by @Astounding’s algorithm (generalized from 3 to 8 counters, as required for certain input data).
The only difference I observed was that @Astounding’s algorithm is about 2.3 times faster than mine (apparently because my algorithm uses character functions). The results, however, were the same in all cases. Full details can be found below and in the attachment.
Best regards,
Reinhard
Here is the code I used for the tests:
/* Macros to create test data */
%macro create(out=have, values=, seed=);
%local i;
/* All n**16 possible combinations of Week1, ..., Week16 values in the n-element set {&VALUES}
in lexicographic order */
data &out;
call streaminit(&seed);
length IDnumber $10;
%do i=1 %to 16;
do Week&i = &values;
%end;
seqno+1;
r=rand('UNIFORM');
IDnumber=put(seqno,10.);
output;
%do i=1 %to 16;
end;
%end;
drop seqno;
run;
%mend create;
%macro sort(data=have, out=have, order=);
%local i;
/* reverse lexicographic order */
%if %upcase(&order)=REVLEX %then %do;
proc sort data=&data out=&out;
by
%do i=1 %to 16;
descending Week&i
%end;;
run;
%end;
/* random order */
%else %if %upcase(&order)=RANDOM %then %do;
proc sort data=&data out=&out;
by r;
run;
%end;
%mend sort;
/* Macro to apply algorithms and compare results */
%macro comp(data=have);
/* Algorithm proposed by FreelanceReinhard, unchanged */
data wantR;
set &data;
array Week[16];
array c[8];
length _s $16;
do i=1 to 16;
substr(_s,i)=ifc(Week[i]=11, 'X', ' ');
end;
do i=1 by 1;
_c=lengthn(scan(_s,i));
if ~_c then return;
c[i]=_c;
end;
drop i _s _c;
run;
/* Reference algorithm proposed by Astounding, with two minor modifications:
1) Increased dimension of array C from 3 to 8 in order to allow for all possible
input data vectors, adapted inequalities for C_SUBSCRIPT accordingly (3 --> 8).
2) Inserted DROP statement for temporary variables.
*/
data wantA;
set &data;
array week {16};
array c {8};
c_subscript=0;
count_11=0;
do J=1 to 16;
if week{J}=11 then count_11 + 1;
else do;
if count_11 > 0 then do;
c_subscript + 1;
if (1 <= c_subscript <= 8) then c{c_subscript} = count_11;
end;
count_11=0;
end;
end;
c_subscript + 1;
if count_11 > 0 and (1 <= c_subscript <= 8) then c{c_subscript} = count_11;
drop c_subscript count_11 j;
run;
/* Compare result datasets */
proc compare data=wantR c=wantA;
run;
%mend comp;
options nodate nonumber;
title 'All 2**16=65536 possible combinations of Week1, ..., Week16 values in {., 11}';
title2 'in lexicographic order';
%create(values=%str(., 11), seed=271828)
%comp;
title 'All 2**16 combinations of {., 11} in reverse lexicographic order';
%sort(order=revlex)
%comp;
title 'All 2**16 combinations of {., 11} in random order';
%sort(order=random)
%comp;
title 'All 3**16=43,046,721 possible combinations of Week1, ..., Week16 values in {., 10, 11}';
title2 'in lexicographic order';
%create(values=%str(., 10, 11), seed=314159) /* CAUTION: Dataset HAVE is 6.10 GB large. */
%comp; /* Datasets wantR and wantA are 8.67 GB each! */
title 'All 3**16 combinations of {., 10, 11} in reverse lexicographic order';
%sort(order=revlex)
%comp;
title 'All 3**16 combinations of {., 10, 11} in random order';
%sort(order=random)
%comp;
title '100,000,000 random vectors (Week1, ..., Week16) with components in {., 11, 12}';
data have;
call streaminit(577219);
length IDnumber $10;
array v[3] _temporary_ (. 11 12);
array Week[16];
do i=1 to 1e8;
IDnumber=put(i,10.);
do j=1 to 16;
Week[j]=v[rand('TABLE',1/3,1/3,1/3)];
end;
output;
end;
drop i j;
run; /* Dataset HAVE is 13.4 GB large! */
/* Check for consecutive duplicate records */
data dups;
set have;
by week1-week16 notsorted;
if ~(first.week16 & last.week16);
run; /* 8 obs. (4 pairs) */
%comp; /* Datasets wantR and wantA are 19.4 GB each! */
title;
The attached .zip file contains the SAS log (Comparisons.log, 13 KB) and the output (Comparisons.lst, 7 KB) created by the above program.
... View more