turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- How to calculate the lengths os 0-1 sequences?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-07-2014 05:19 PM

I have a problem with calculating the following. I want to calculate length of 0 and 1 sequences in each row. I have the following:

m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | m9 | m10 | m11 | m12 | m13 | m14 | m15 |

0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 |

0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 |

0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |

0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |

0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 |

where m is a month number

I wish to have **length** of each sequence of ones until zeros sequence starts. And then zeros until ones sequence starts and so on. This is what I want to get:

m1 | m2 | m3 | m4 | m5 | m6 | m7 | m8 | m9 | m10 | m11 | m12 | m13 | m14 | m15 | First 0 sequnce | First 1 sequence | second 0 sequnce | second 1 sequence | Third 0 sequnce |

0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 3 | 3 | 2 | 2 |

0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 5 | 4 | 2 | 4 | 0 |

0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 4 | 2 | 2 | 5 | 2 |

0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1 | 4 |

0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 2 | 3 | 1 | 1 | 1 |

The first sequence is always the zero sequence (but different length for each case).

I will be grateful for any help.

Accepted Solutions

Solution

01-08-2014
03:21 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-08-2014 03:21 PM

A BIG thank you, PGStats.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-07-2014 06:25 PM

Are you going to need to generalize this to more than 15 M variables?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-08-2014 02:32 AM

yes, I am , up to 204M variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-08-2014 08:36 AM

Here is a solution that is not limited by the number of variables:

**data have;****input m1-m15;****datalines;****0 1 1 1 0 0 0 1 1 0 0 1 1 0 0****0 0 0 0 0 1 1 1 1 0 0 1 1 1 1****0 0 0 0 1 1 0 0 1 1 1 1 1 0 0****0 0 1 1 0 1 0 0 0 0 1 1 1 1 1****0 0 1 1 1 0 1 0 1 1 1 0 0 1 0****;**

**data temp;****set have;****array v01{*} _numeric_;****row = _n_;****do i = 1 to dim(v01);**** m = v01{i};**** output;**** end;****keep row m;****run;**

**data seq;****length seqId $16;****set temp; by row m notsorted;****if first.row then seqNo = 1;****if first.m then seqN = 0;****seqN + 1;****if last.m then do;**** seqId = cats("S", seqNo, "_", m);**** output;**** seqNo + m;**** end;****keep row seqId seqN;****run;**

**proc transpose data=seq out=seqT(drop=_: row);****by row;****id seqId;****run;**

**data want;****set have; set seqT;****run;**

**proc print data=want noobs; run;**

PG

PG

Solution

01-08-2014
03:21 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-08-2014 03:21 PM

A BIG thank you, PGStats.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-08-2014 10:07 AM

Here is another possibility, raw input was stolen from PG's post.

**data** have;

input m1-m15;

datalines;

0 1 1 1 0 0 0 1 1 0 0 1 1 0 0

0 0 0 0 0 1 1 1 1 0 0 1 1 1 1

0 0 0 0 1 1 0 0 1 1 1 1 1 0 0

0 0 1 1 0 1 0 0 0 0 1 1 1 1 1

0 0 1 1 1 0 1 0 1 1 1 0 0 1 0

;

**proc** **sql** noprint;

select ceil(nvar/**2**) into :dim trimmed from dictionary.tables

where libname='WORK' AND MEMNAME='HAVE';

**QUIT**;

**data** want;

ARRAY ZERO(&DIM) Zero1-Zero&dim;

ARRAY ONE(&DIM) One1-One&dim;

set have;

ARRAY M M:;

_cat=CATS(OF M(*));

do _i_=**1** to &dim until (missing(_check));

zero(_i_)=ifn(lengthn(scan(_cat,_i_,'1')), lengthn(scan(_cat,_i_,'1')),**.**);

one(_i_)=ifn(lengthn(scan(_cat,_i_,'0')), lengthn(scan(_cat,_i_,'0')),**.**);;

_check=sum(zero(_i_),one(_i_));

end;

drop _:;

run;

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-08-2014 03:21 PM

Thank you, Hai.kuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-08-2014 04:12 PM

You already have two good solutions, so I'll just make two comments:

1) The number of 0s until a 1 (or the number of 1s until a 0) is called a "run." The number of runs in a sequence (and the length of runs) can be used to as a test for statistical randomness. See How to tell whether a sequence of heads and tails is random - The DO Loop

2) Mathematically, the simplest way to count the length of runs is to use a trick that I call the DIF-SIGN trick. It is used in the previously mentioned article to compute the length of runs. It is explained in more detail in this article: Using finite differences to estimate the maximum of a time series - The DO Loop Probably someone can convert the technique to a DATA step, which ought to be very fast.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-09-2014 04:25 AM

Thank you, Rick for your valuable comments and article! That's interesting.