I have a problem with calculating the following. I want to calculate length of 0 and 1 sequences in each row. I have the following:
m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | m9 | m10 | m11 | m12 | m13 | m14 | m15 |
0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 |
0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 |
where m is a month number
I wish to have length of each sequence of ones until zeros sequence starts. And then zeros until ones sequence starts and so on. This is what I want to get:
m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | m9 | m10 | m11 | m12 | m13 | m14 | m15 | First 0 sequnce | First 1 sequence | second 0 sequnce | second 1 sequence | Third 0 sequnce |
0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 3 | 3 | 2 | 2 |
0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 5 | 4 | 2 | 4 | 0 |
0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 4 | 2 | 2 | 5 | 2 |
0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 4 |
0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 2 | 3 | 1 | 1 | 1 |
The first sequence is always the zero sequence (but different length for each case).
I will be grateful for any help.
A BIG thank you, PGStats.
Are you going to need to generalize this to more than 15 M variables?
yes, I am , up to 204M variables.
Here is a solution that is not limited by the number of variables:
data have;
input m1-m15;
datalines;
0 1 1 1 0 0 0 1 1 0 0 1 1 0 0
0 0 0 0 0 1 1 1 1 0 0 1 1 1 1
0 0 0 0 1 1 0 0 1 1 1 1 1 0 0
0 0 1 1 0 1 0 0 0 0 1 1 1 1 1
0 0 1 1 1 0 1 0 1 1 1 0 0 1 0
;
data temp;
set have;
array v01{*} _numeric_;
row = _n_;
do i = 1 to dim(v01);
m = v01{i};
output;
end;
keep row m;
run;
data seq;
length seqId $16;
set temp; by row m notsorted;
if first.row then seqNo = 1;
if first.m then seqN = 0;
seqN + 1;
if last.m then do;
seqId = cats("S", seqNo, "_", m);
output;
seqNo + m;
end;
keep row seqId seqN;
run;
proc transpose data=seq out=seqT(drop=_: row);
by row;
id seqId;
run;
data want;
set have; set seqT;
run;
proc print data=want noobs; run;
PG
A BIG thank you, PGStats.
Here is another possibility, raw input was stolen from PG's post.
data have;
input m1-m15;
datalines;
0 1 1 1 0 0 0 1 1 0 0 1 1 0 0
0 0 0 0 0 1 1 1 1 0 0 1 1 1 1
0 0 0 0 1 1 0 0 1 1 1 1 1 0 0
0 0 1 1 0 1 0 0 0 0 1 1 1 1 1
0 0 1 1 1 0 1 0 1 1 1 0 0 1 0
;
proc sql noprint;
select ceil(nvar/2) into :dim trimmed from dictionary.tables
where libname='WORK' AND MEMNAME='HAVE';
QUIT;
data want;
ARRAY ZERO(&DIM) Zero1-Zero&dim;
ARRAY ONE(&DIM) One1-One&dim;
set have;
ARRAY M M:;
_cat=CATS(OF M(*));
do _i_=1 to &dim until (missing(_check));
zero(_i_)=ifn(lengthn(scan(_cat,_i_,'1')), lengthn(scan(_cat,_i_,'1')),.);
one(_i_)=ifn(lengthn(scan(_cat,_i_,'0')), lengthn(scan(_cat,_i_,'0')),.);;
_check=sum(zero(_i_),one(_i_));
end;
drop _:;
run;
Haikuo
Thank you, Hai.kuo
You already have two good solutions, so I'll just make two comments:
1) The number of 0s until a 1 (or the number of 1s until a 0) is called a "run." The number of runs in a sequence (and the length of runs) can be used to as a test for statistical randomness. See How to tell whether a sequence of heads and tails is random - The DO Loop
2) Mathematically, the simplest way to count the length of runs is to use a trick that I call the DIF-SIGN trick. It is used in the previously mentioned article to compute the length of runs. It is explained in more detail in this article: Using finite differences to estimate the maximum of a time series - The DO Loop Probably someone can convert the technique to a DATA step, which ought to be very fast.
Thank you, Rick for your valuable comments and article! That's interesting.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.