DATA Step, Macro, Functions and more

How to calculate the lengths os 0-1 sequences?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

How to calculate the lengths os 0-1 sequences?

I have a problem with calculating the following. I want to calculate length of 0 and 1 sequences in each row. I have the following:

m1

m2

m3

m4

m5

m6

m7

m8

m9

m10

m11

m12

m13

m14

m15

0

1

1

1

0

0

0

1

1

0

0

1

1

0

0

0

0

0

0

0

1

1

1

1

0

0

1

1

1

1

0

0

0

0

1

1

0

0

1

1

1

1

1

0

0

0

0

1

1

0

1

0

0

0

0

1

1

1

1

1

0

0

1

1

1

0

1

0

1

1

1

0

0

1

0

where m is a month number

I wish to have length of each sequence of ones until zeros sequence starts. And then zeros until ones sequence starts and so on. This is what I want to get:

m1

m2

m3

m4

m5

m6

m7

m8

m9

m10

m11

m12

m13

m14

m15

First 0 sequnce

First 1 sequence

second 0 sequnce

second 1 sequence

Third 0 sequnce

0

1

1

1

0

0

0

1

1

0

0

1

1

0

0

1

3

3

2

2

0

0

0

0

0

1

1

1

1

0

0

1

1

1

1

5

4

2

4

0

0

0

0

0

1

1

0

0

1

1

1

1

1

0

0

4

2

2

5

2

0

0

1

1

0

1

0

0

0

0

1

1

1

1

1

2

2

1

1

4

0

0

1

1

1

0

1

0

1

1

1

0

0

1

0

2

3

1

1

1

The first sequence is always the zero sequence (but different length for each case).

I will be grateful for any help.


Accepted Solutions
Solution
‎01-08-2014 03:21 PM
Occasional Contributor
Posts: 5

Re: How to calculate the lengths os 0-1 sequences?

A BIG thank you, PGStats.

View solution in original post


All Replies
Super User
Posts: 10,538

Re: How to calculate the lengths os 0-1 sequences?

Are you going to need to generalize this to more than 15 M variables?

Occasional Contributor
Posts: 5

Re: How to calculate the lengths os 0-1 sequences?

yes, I am , up to 204M variables.

Respected Advisor
Posts: 4,659

Re: How to calculate the lengths os 0-1 sequences?

Here is a solution that is not limited by the number of variables:

data have;
input m1-m15;
datalines;
0 1 1 1 0 0 0 1 1 0 0 1 1 0 0
0 0 0 0 0 1 1 1 1 0 0 1 1 1 1
0 0 0 0 1 1 0 0 1 1 1 1 1 0 0
0 0 1 1 0 1 0 0 0 0 1 1 1 1 1
0 0 1 1 1 0 1 0 1 1 1 0 0 1 0
;

data temp;
set have;
array v01{*} _numeric_;
row = _n_;
do i = 1 to dim(v01);
     m = v01{i};
     output;
     end;
keep row m;
run;

data seq;
length seqId $16;
set temp; by row m notsorted;
if first.row then seqNo = 1;
if first.m then seqN = 0;
seqN + 1;
if last.m then do;
     seqId = cats("S", seqNo, "_", m);
     output;
     seqNo + m;
     end;
keep row seqId seqN;
run;

proc transpose data=seq out=seqT(drop=_: row);
by row;
id seqId;
run;

data want;
set have; set seqT;
run;

proc print data=want noobs; run;

PG

PG
Solution
‎01-08-2014 03:21 PM
Occasional Contributor
Posts: 5

Re: How to calculate the lengths os 0-1 sequences?

A BIG thank you, PGStats.

Respected Advisor
Posts: 3,124

Re: How to calculate the lengths os 0-1 sequences?

Here is another possibility, raw input was stolen from PG's post.

data have;

input m1-m15;

datalines;

0 1 1 1 0 0 0 1 1 0 0 1 1 0 0

0 0 0 0 0 1 1 1 1 0 0 1 1 1 1

0 0 0 0 1 1 0 0 1 1 1 1 1 0 0

0 0 1 1 0 1 0 0 0 0 1 1 1 1 1

0 0 1 1 1 0 1 0 1 1 1 0 0 1 0

;

proc sql noprint;

select ceil(nvar/2) into :dim trimmed from dictionary.tables

   where libname='WORK' AND MEMNAME='HAVE';

QUIT;

data want; 

  ARRAY ZERO(&DIM) Zero1-Zero&dim;

  ARRAY ONE(&DIM) One1-One&dim;

  set have;

  ARRAY M M:;

  _cat=CATS(OF M(*));

  do _i_=1 to &dim until (missing(_check));

    zero(_i_)=ifn(lengthn(scan(_cat,_i_,'1')), lengthn(scan(_cat,_i_,'1')),.);

      one(_i_)=ifn(lengthn(scan(_cat,_i_,'0')), lengthn(scan(_cat,_i_,'0')),.);;

      _check=sum(zero(_i_),one(_i_));

  end;

  drop _:;

run;


Haikuo

Occasional Contributor
Posts: 5

Re: How to calculate the lengths os 0-1 sequences?

Thank you, Hai.kuo

SAS Super FREQ
Posts: 3,483

Re: How to calculate the lengths os 0-1 sequences?

You already have two good solutions, so I'll just make two comments:

1) The number of 0s until a 1 (or the number of 1s until a 0) is called a "run." The number of runs in a sequence (and the length of runs) can be used to as a test for statistical randomness. See How to tell whether a sequence of heads and tails is random - The DO Loop

2) Mathematically, the simplest way to count the length of runs is to use a trick that I call the DIF-SIGN trick. It is used in the previously mentioned article to compute the length of runs. It is explained in more detail in this article: Using finite differences to estimate the maximum of a time series - The DO Loop   Probably someone can convert the technique to a DATA step, which ought to be very fast.

Occasional Contributor
Posts: 5

Re: How to calculate the lengths os 0-1 sequences?

Thank you, Rick for your valuable comments and article! That's interesting.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 468 views
  • 8 likes
  • 5 in conversation