Re: Find consecutive rows

Nsas · Posted 02-11-2017 05:19 PM

I wanna count the consecutive rows (>=3) with RUN=0 between rows with RUN=10.

Have:

data have;
input ID day run;
datalines;
1 0 0
1 2 10
1 2 0
1 3 10
1 4 0
1 4 0
1 5 0
1 7 0
1 7 0
1 10 0
1 11 10
1 12 0
1 14 0
1 15 0
1 16 10
1 17 0
1 18 0
1 19 0
1 20 0
1 21 10
1 22 0
1 23 0
1 24 0
1 25 0
1 26 10
1 27 0
1 28 0
1 29 10
1 30 0
;;

want:

ID	DAY	RUN	COUNT
1	0	0	1
1	2	10	.
1	2	0	1
1	3	10	.
1	4	0	2
1	4	0	2
1	5	0	2
1	7	0	2
1	7	0	2
1	10	0	2
1	11	10	.
1	12	0	3
1	14	0	3
1	15	0	3
1	16	10	.
1	17	0	4
1	18	0	4
1	19	0	4
1	20	0	4
1	21	10	.
1	22	0	5
1	23	0	5
1	24	0	5
1	25	0	5
1	26	10	.
1	27	0	1
1	28	0	1
1	29	10	.
1	30	0	1

I do appreciate any tips to get this output.

Note: All consecutive rows with RUN=0 and day <10 will be assigned the same value.

Patrick · Posted 02-11-2017 06:52 PM

Is below a possible scenario and if yes how would the count look like??

Nsas · Posted 02-11-2017 07:16 PM

Count would be:

1

.

1

.

2

Patrick · Posted 02-11-2017 08:30 PM

Here a way to go:

data have;
input  ID day run;
datalines;
1 0 0
1 2 10
1 2 0
1 3 10
1 4 0
1 4 0
1 5 0
1 7 0
1 7 0
1 10 0
1 11 10
1 12 0
1 14 0
1 15 0
1 16 10
1 17 0
1 18 0
1 19 0
1 20 0
1 21 10
1 22 0
1 23 0
1 24 0
1 25 0
1 26 10
1 27 0
1 28 0
1 29 10
1 30 0
;
run;

data inter1 
     HaveWithGroup(keep=ID day run group);
  set have;
  by run notsorted;

  if first.run and run ne 10 then
    do;
      _group+1;
    end;

  if run=10 then call missing(group);
  else group=_group;

  if run=10 then group_cnt=.;
  else group_cnt+1;

  output HaveWithGroup;

  if last.run and run ne 10 then 
    do;
      if group_cnt<3 then count=1;
      else count+1;
      output inter1;
    end;

run;

data want;
  set HaveWithGroup;
  call missing(count);
  if _n_=1 then 
    do;
      dcl hash h1(dataset:'inter1(keep=group count)', multidata:'n');
      h1.defineKey('group');
      h1.defineData('count');
      h1.defineDone();
    end;
  if run ne 10 then h1.find();
run;

mkeintz · Posted 02-11-2017 08:49 PM

You want to find all instances of 3+ consecutive zeroes between 10's. When such a series is found you increment the sequence number by 1. But if a shorter series is found you reset the sequence number to 1. Is that correct?

data want (drop=nzero seq);
  do Nzero=0 by 1 until (run=10 or eod);
    set have end=eod;
  end;
  if nzero>=3 then seq+1;
  else seq=1;

  count=seq;
  do until(run=10 or eod2);
    set have end=eod2;
    if run=10 then count=.;
    output;
  end;
run;

Notes:

This program assumes that run=10 or 0 - no other values appear, as in your sample data.
The "do nzero=" loop starts at zero rather than one, so that when the "run=10" record countributes to the increment of nzero, it does not produce an overcount.
I include the "or eod" and "or eod2" in the "until" conditions to accomodate the possibility that the last record of the dataset does not have run=10.

If you have run values other than 10 and 0, then counting the consecutive records between 10's is no longer the same as counting "run=0" records. You'd have to modify the program to this:

data want (drop=threezeroesfound seq);
  do until (run=10 or eod);
    set have end=eod;
    if lag2(run)=0 and lag(run)=0 and run=0 then threezeroesfound=1;
  end;
  if threezeroesfound then seq+1;
  else seq=1;
  
  count=seq;
  do until(run=10 or eod2);
    set have end=eod2;
    if run=10 then count=.;
    output;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

ari · Posted 02-15-2017 10:52 AM

@mkeintz:

Thanks.

mkeintz · Posted 02-15-2017 11:00 AM

I'm not sure I fully comprehend. Please show some data before and after using the new criterion.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Find consecutive rows