BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ania
Calcite | Level 5

I have a problem with calculating the following. I want to calculate length of 0 and 1 sequences in each row. I have the following:

m1

m2

m3

m4

m5

m6

m7

m8

m9

m10

m11

m12

m13

m14

m15

0

1

1

1

0

0

0

1

1

0

0

1

1

0

0

0

0

0

0

0

1

1

1

1

0

0

1

1

1

1

0

0

0

0

1

1

0

0

1

1

1

1

1

0

0

0

0

1

1

0

1

0

0

0

0

1

1

1

1

1

0

0

1

1

1

0

1

0

1

1

1

0

0

1

0

where m is a month number

I wish to have length of each sequence of ones until zeros sequence starts. And then zeros until ones sequence starts and so on. This is what I want to get:

m1

m2

m3

m4

m5

m6

m7

m8

m9

m10

m11

m12

m13

m14

m15

First 0 sequnce

First 1 sequence

second 0 sequnce

second 1 sequence

Third 0 sequnce

0

1

1

1

0

0

0

1

1

0

0

1

1

0

0

1

3

3

2

2

0

0

0

0

0

1

1

1

1

0

0

1

1

1

1

5

4

2

4

0

0

0

0

0

1

1

0

0

1

1

1

1

1

0

0

4

2

2

5

2

0

0

1

1

0

1

0

0

0

0

1

1

1

1

1

2

2

1

1

4

0

0

1

1

1

0

1

0

1

1

1

0

0

1

0

2

3

1

1

1

The first sequence is always the zero sequence (but different length for each case).

I will be grateful for any help.

1 ACCEPTED SOLUTION

Accepted Solutions
Ania
Calcite | Level 5

A BIG thank you, PGStats.

View solution in original post

8 REPLIES 8
ballardw
Super User

Are you going to need to generalize this to more than 15 M variables?

Ania
Calcite | Level 5

yes, I am , up to 204M variables.

PGStats
Opal | Level 21

Here is a solution that is not limited by the number of variables:

data have;
input m1-m15;
datalines;
0 1 1 1 0 0 0 1 1 0 0 1 1 0 0
0 0 0 0 0 1 1 1 1 0 0 1 1 1 1
0 0 0 0 1 1 0 0 1 1 1 1 1 0 0
0 0 1 1 0 1 0 0 0 0 1 1 1 1 1
0 0 1 1 1 0 1 0 1 1 1 0 0 1 0
;

data temp;
set have;
array v01{*} _numeric_;
row = _n_;
do i = 1 to dim(v01);
     m = v01{i};
     output;
     end;
keep row m;
run;

data seq;
length seqId $16;
set temp; by row m notsorted;
if first.row then seqNo = 1;
if first.m then seqN = 0;
seqN + 1;
if last.m then do;
     seqId = cats("S", seqNo, "_", m);
     output;
     seqNo + m;
     end;
keep row seqId seqN;
run;

proc transpose data=seq out=seqT(drop=_: row);
by row;
id seqId;
run;

data want;
set have; set seqT;
run;

proc print data=want noobs; run;

PG

PG
Ania
Calcite | Level 5

A BIG thank you, PGStats.

Haikuo
Onyx | Level 15

Here is another possibility, raw input was stolen from PG's post.

data have;

input m1-m15;

datalines;

0 1 1 1 0 0 0 1 1 0 0 1 1 0 0

0 0 0 0 0 1 1 1 1 0 0 1 1 1 1

0 0 0 0 1 1 0 0 1 1 1 1 1 0 0

0 0 1 1 0 1 0 0 0 0 1 1 1 1 1

0 0 1 1 1 0 1 0 1 1 1 0 0 1 0

;

proc sql noprint;

select ceil(nvar/2) into :dim trimmed from dictionary.tables

   where libname='WORK' AND MEMNAME='HAVE';

QUIT;

data want; 

  ARRAY ZERO(&DIM) Zero1-Zero&dim;

  ARRAY ONE(&DIM) One1-One&dim;

  set have;

  ARRAY M M:;

  _cat=CATS(OF M(*));

  do _i_=1 to &dim until (missing(_check));

    zero(_i_)=ifn(lengthn(scan(_cat,_i_,'1')), lengthn(scan(_cat,_i_,'1')),.);

      one(_i_)=ifn(lengthn(scan(_cat,_i_,'0')), lengthn(scan(_cat,_i_,'0')),.);;

      _check=sum(zero(_i_),one(_i_));

  end;

  drop _:;

run;


Haikuo

Ania
Calcite | Level 5

Thank you, Hai.kuo

Rick_SAS
SAS Super FREQ

You already have two good solutions, so I'll just make two comments:

1) The number of 0s until a 1 (or the number of 1s until a 0) is called a "run." The number of runs in a sequence (and the length of runs) can be used to as a test for statistical randomness. See How to tell whether a sequence of heads and tails is random - The DO Loop

2) Mathematically, the simplest way to count the length of runs is to use a trick that I call the DIF-SIGN trick. It is used in the previously mentioned article to compute the length of runs. It is explained in more detail in this article: Using finite differences to estimate the maximum of a time series - The DO Loop   Probably someone can convert the technique to a DATA step, which ought to be very fast.

Ania
Calcite | Level 5

Thank you, Rick for your valuable comments and article! That's interesting.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2217 views
  • 8 likes
  • 5 in conversation