I have a problem with calculating the following. I want to calculate length of 0 and 1 sequences in each row. I have the following:
m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | m9 | m10 | m11 | m12 | m13 | m14 | m15 |
0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 |
0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 |
where m is a month number
I wish to have length of each sequence of ones until zeros sequence starts. And then zeros until ones sequence starts and so on. This is what I want to get:
m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | m9 | m10 | m11 | m12 | m13 | m14 | m15 | First 0 sequnce | First 1 sequence | second 0 sequnce | second 1 sequence | Third 0 sequnce |
0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 3 | 3 | 2 | 2 |
0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 5 | 4 | 2 | 4 | 0 |
0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 4 | 2 | 2 | 5 | 2 |
0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 4 |
0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 2 | 3 | 1 | 1 | 1 |
The first sequence is always the zero sequence (but different length for each case).
I will be grateful for any help.
A BIG thank you, PGStats.
Are you going to need to generalize this to more than 15 M variables?
yes, I am , up to 204M variables.
Here is a solution that is not limited by the number of variables:
data have;
input m1-m15;
datalines;
0 1 1 1 0 0 0 1 1 0 0 1 1 0 0
0 0 0 0 0 1 1 1 1 0 0 1 1 1 1
0 0 0 0 1 1 0 0 1 1 1 1 1 0 0
0 0 1 1 0 1 0 0 0 0 1 1 1 1 1
0 0 1 1 1 0 1 0 1 1 1 0 0 1 0
;
data temp;
set have;
array v01{*} _numeric_;
row = _n_;
do i = 1 to dim(v01);
m = v01{i};
output;
end;
keep row m;
run;
data seq;
length seqId $16;
set temp; by row m notsorted;
if first.row then seqNo = 1;
if first.m then seqN = 0;
seqN + 1;
if last.m then do;
seqId = cats("S", seqNo, "_", m);
output;
seqNo + m;
end;
keep row seqId seqN;
run;
proc transpose data=seq out=seqT(drop=_: row);
by row;
id seqId;
run;
data want;
set have; set seqT;
run;
proc print data=want noobs; run;
PG
A BIG thank you, PGStats.
Here is another possibility, raw input was stolen from PG's post.
data have;
input m1-m15;
datalines;
0 1 1 1 0 0 0 1 1 0 0 1 1 0 0
0 0 0 0 0 1 1 1 1 0 0 1 1 1 1
0 0 0 0 1 1 0 0 1 1 1 1 1 0 0
0 0 1 1 0 1 0 0 0 0 1 1 1 1 1
0 0 1 1 1 0 1 0 1 1 1 0 0 1 0
;
proc sql noprint;
select ceil(nvar/2) into :dim trimmed from dictionary.tables
where libname='WORK' AND MEMNAME='HAVE';
QUIT;
data want;
ARRAY ZERO(&DIM) Zero1-Zero&dim;
ARRAY ONE(&DIM) One1-One&dim;
set have;
ARRAY M M:;
_cat=CATS(OF M(*));
do _i_=1 to &dim until (missing(_check));
zero(_i_)=ifn(lengthn(scan(_cat,_i_,'1')), lengthn(scan(_cat,_i_,'1')),.);
one(_i_)=ifn(lengthn(scan(_cat,_i_,'0')), lengthn(scan(_cat,_i_,'0')),.);;
_check=sum(zero(_i_),one(_i_));
end;
drop _:;
run;
Haikuo
Thank you, Hai.kuo
You already have two good solutions, so I'll just make two comments:
1) The number of 0s until a 1 (or the number of 1s until a 0) is called a "run." The number of runs in a sequence (and the length of runs) can be used to as a test for statistical randomness. See How to tell whether a sequence of heads and tails is random - The DO Loop
2) Mathematically, the simplest way to count the length of runs is to use a trick that I call the DIF-SIGN trick. It is used in the previously mentioned article to compute the length of runs. It is explained in more detail in this article: Using finite differences to estimate the maximum of a time series - The DO Loop Probably someone can convert the technique to a DATA step, which ought to be very fast.
Thank you, Rick for your valuable comments and article! That's interesting.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.