## How to calculate the lengths os 0-1 sequences?

Solved
Occasional Contributor
Posts: 5

# How to calculate the lengths os 0-1 sequences?

I have a problem with calculating the following. I want to calculate length of 0 and 1 sequences in each row. I have the following:

 m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12 m13 m14 m15 0 1 1 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 1 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 0 0 1 1 1 0 1 0 1 1 1 0 0 1 0

where m is a month number

I wish to have length of each sequence of ones until zeros sequence starts. And then zeros until ones sequence starts and so on. This is what I want to get:

 m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12 m13 m14 m15 First 0 sequnce First 1 sequence second 0 sequnce second 1 sequence Third 0 sequnce 0 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 3 3 2 2 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 5 4 2 4 0 0 0 0 0 1 1 0 0 1 1 1 1 1 0 0 4 2 2 5 2 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 2 2 1 1 4 0 0 1 1 1 0 1 0 1 1 1 0 0 1 0 2 3 1 1 1

The first sequence is always the zero sequence (but different length for each case).

I will be grateful for any help.

Accepted Solutions
Solution
‎01-08-2014 03:21 PM
Occasional Contributor
Posts: 5

## Re: How to calculate the lengths os 0-1 sequences?

A BIG thank you, PGStats.

All Replies
Super User
Posts: 13,523

## Re: How to calculate the lengths os 0-1 sequences?

Are you going to need to generalize this to more than 15 M variables?

Occasional Contributor
Posts: 5

## Re: How to calculate the lengths os 0-1 sequences?

yes, I am , up to 204M variables.

Posts: 5,526

## Re: How to calculate the lengths os 0-1 sequences?

Here is a solution that is not limited by the number of variables:

data have;
input m1-m15;
datalines;
0 1 1 1 0 0 0 1 1 0 0 1 1 0 0
0 0 0 0 0 1 1 1 1 0 0 1 1 1 1
0 0 0 0 1 1 0 0 1 1 1 1 1 0 0
0 0 1 1 0 1 0 0 0 0 1 1 1 1 1
0 0 1 1 1 0 1 0 1 1 1 0 0 1 0
;

data temp;
set have;
array v01{*} _numeric_;
row = _n_;
do i = 1 to dim(v01);
m = v01{i};
output;
end;
keep row m;
run;

data seq;
length seqId \$16;
set temp; by row m notsorted;
if first.row then seqNo = 1;
if first.m then seqN = 0;
seqN + 1;
if last.m then do;
seqId = cats("S", seqNo, "_", m);
output;
seqNo + m;
end;
keep row seqId seqN;
run;

proc transpose data=seq out=seqT(drop=_: row);
by row;
id seqId;
run;

data want;
set have; set seqT;
run;

proc print data=want noobs; run;

PG

PG
Solution
‎01-08-2014 03:21 PM
Occasional Contributor
Posts: 5

## Re: How to calculate the lengths os 0-1 sequences?

A BIG thank you, PGStats.

Posts: 3,167

## Re: How to calculate the lengths os 0-1 sequences?

Here is another possibility, raw input was stolen from PG's post.

data have;

input m1-m15;

datalines;

0 1 1 1 0 0 0 1 1 0 0 1 1 0 0

0 0 0 0 0 1 1 1 1 0 0 1 1 1 1

0 0 0 0 1 1 0 0 1 1 1 1 1 0 0

0 0 1 1 0 1 0 0 0 0 1 1 1 1 1

0 0 1 1 1 0 1 0 1 1 1 0 0 1 0

;

proc sql noprint;

select ceil(nvar/2) into :dim trimmed from dictionary.tables

where libname='WORK' AND MEMNAME='HAVE';

QUIT;

data want;

ARRAY ZERO(&DIM) Zero1-Zero&dim;

ARRAY ONE(&DIM) One1-One&dim;

set have;

ARRAY M M:;

_cat=CATS(OF M(*));

do _i_=1 to &dim until (missing(_check));

zero(_i_)=ifn(lengthn(scan(_cat,_i_,'1')), lengthn(scan(_cat,_i_,'1')),.);

one(_i_)=ifn(lengthn(scan(_cat,_i_,'0')), lengthn(scan(_cat,_i_,'0')),.);;

_check=sum(zero(_i_),one(_i_));

end;

drop _:;

run;

Haikuo

Occasional Contributor
Posts: 5

## Re: How to calculate the lengths os 0-1 sequences?

Thank you, Hai.kuo

SAS Super FREQ
Posts: 4,240

## Re: How to calculate the lengths os 0-1 sequences?

You already have two good solutions, so I'll just make two comments:

1) The number of 0s until a 1 (or the number of 1s until a 0) is called a "run." The number of runs in a sequence (and the length of runs) can be used to as a test for statistical randomness. See How to tell whether a sequence of heads and tails is random - The DO Loop

2) Mathematically, the simplest way to count the length of runs is to use a trick that I call the DIF-SIGN trick. It is used in the previously mentioned article to compute the length of runs. It is explained in more detail in this article: Using finite differences to estimate the maximum of a time series - The DO Loop   Probably someone can convert the technique to a DATA step, which ought to be very fast.

Occasional Contributor
Posts: 5