BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ania
Calcite | Level 5

I have a problem with calculating the following. I want to calculate length of 0 and 1 sequences in each row. I have the following:

m1

m2

m3

m4

m5

m6

m7

m8

m9

m10

m11

m12

m13

m14

m15

0

1

1

1

0

0

0

1

1

0

0

1

1

0

0

0

0

0

0

0

1

1

1

1

0

0

1

1

1

1

0

0

0

0

1

1

0

0

1

1

1

1

1

0

0

0

0

1

1

0

1

0

0

0

0

1

1

1

1

1

0

0

1

1

1

0

1

0

1

1

1

0

0

1

0

where m is a month number

I wish to have length of each sequence of ones until zeros sequence starts. And then zeros until ones sequence starts and so on. This is what I want to get:

m1

m2

m3

m4

m5

m6

m7

m8

m9

m10

m11

m12

m13

m14

m15

First 0 sequnce

First 1 sequence

second 0 sequnce

second 1 sequence

Third 0 sequnce

0

1

1

1

0

0

0

1

1

0

0

1

1

0

0

1

3

3

2

2

0

0

0

0

0

1

1

1

1

0

0

1

1

1

1

5

4

2

4

0

0

0

0

0

1

1

0

0

1

1

1

1

1

0

0

4

2

2

5

2

0

0

1

1

0

1

0

0

0

0

1

1

1

1

1

2

2

1

1

4

0

0

1

1

1

0

1

0

1

1

1

0

0

1

0

2

3

1

1

1

The first sequence is always the zero sequence (but different length for each case).

I will be grateful for any help.

1 ACCEPTED SOLUTION

Accepted Solutions
Ania
Calcite | Level 5

A BIG thank you, PGStats.

View solution in original post

8 REPLIES 8
ballardw
Super User

Are you going to need to generalize this to more than 15 M variables?

Ania
Calcite | Level 5

yes, I am , up to 204M variables.

PGStats
Opal | Level 21

Here is a solution that is not limited by the number of variables:

data have;
input m1-m15;
datalines;
0 1 1 1 0 0 0 1 1 0 0 1 1 0 0
0 0 0 0 0 1 1 1 1 0 0 1 1 1 1
0 0 0 0 1 1 0 0 1 1 1 1 1 0 0
0 0 1 1 0 1 0 0 0 0 1 1 1 1 1
0 0 1 1 1 0 1 0 1 1 1 0 0 1 0
;

data temp;
set have;
array v01{*} _numeric_;
row = _n_;
do i = 1 to dim(v01);
     m = v01{i};
     output;
     end;
keep row m;
run;

data seq;
length seqId $16;
set temp; by row m notsorted;
if first.row then seqNo = 1;
if first.m then seqN = 0;
seqN + 1;
if last.m then do;
     seqId = cats("S", seqNo, "_", m);
     output;
     seqNo + m;
     end;
keep row seqId seqN;
run;

proc transpose data=seq out=seqT(drop=_: row);
by row;
id seqId;
run;

data want;
set have; set seqT;
run;

proc print data=want noobs; run;

PG

PG
Ania
Calcite | Level 5

A BIG thank you, PGStats.

Haikuo
Onyx | Level 15

Here is another possibility, raw input was stolen from PG's post.

data have;

input m1-m15;

datalines;

0 1 1 1 0 0 0 1 1 0 0 1 1 0 0

0 0 0 0 0 1 1 1 1 0 0 1 1 1 1

0 0 0 0 1 1 0 0 1 1 1 1 1 0 0

0 0 1 1 0 1 0 0 0 0 1 1 1 1 1

0 0 1 1 1 0 1 0 1 1 1 0 0 1 0

;

proc sql noprint;

select ceil(nvar/2) into :dim trimmed from dictionary.tables

   where libname='WORK' AND MEMNAME='HAVE';

QUIT;

data want; 

  ARRAY ZERO(&DIM) Zero1-Zero&dim;

  ARRAY ONE(&DIM) One1-One&dim;

  set have;

  ARRAY M M:;

  _cat=CATS(OF M(*));

  do _i_=1 to &dim until (missing(_check));

    zero(_i_)=ifn(lengthn(scan(_cat,_i_,'1')), lengthn(scan(_cat,_i_,'1')),.);

      one(_i_)=ifn(lengthn(scan(_cat,_i_,'0')), lengthn(scan(_cat,_i_,'0')),.);;

      _check=sum(zero(_i_),one(_i_));

  end;

  drop _:;

run;


Haikuo

Ania
Calcite | Level 5

Thank you, Hai.kuo

Rick_SAS
SAS Super FREQ

You already have two good solutions, so I'll just make two comments:

1) The number of 0s until a 1 (or the number of 1s until a 0) is called a "run." The number of runs in a sequence (and the length of runs) can be used to as a test for statistical randomness. See How to tell whether a sequence of heads and tails is random - The DO Loop

2) Mathematically, the simplest way to count the length of runs is to use a trick that I call the DIF-SIGN trick. It is used in the previously mentioned article to compute the length of runs. It is explained in more detail in this article: Using finite differences to estimate the maximum of a time series - The DO Loop   Probably someone can convert the technique to a DATA step, which ought to be very fast.

Ania
Calcite | Level 5

Thank you, Rick for your valuable comments and article! That's interesting.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1413 views
  • 8 likes
  • 5 in conversation