Calcite | Level 5

How to get the observations for subsequent values based on a condition

Can anyone help how to get the observations based on the below condition:

If 3 consecutive "N" is found in column C for a same value in A column

Eg: A1B4 A1B5 and A1B6 has 3 consecutive N's and I need all three in the output

``````A B C
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y``````

Expected output:

``````A B C
1 4 N
1 5 N
1 6 N
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N``````

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

Re: How to get the observations for subsequent values based on a condition

``````Data have;
input A B C \$;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
set have;
by a c notsorted;
group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;``````
5 REPLIES 5
PROC Star

Re: How to get the observations for subsequent values based on a condition

One approach:

data want;
k=0;
do until (last.c) ;
set have;
by a c notsorted;
k + 1;
end;
do until (last.c) ;
set have;
by a c notsorted;
if k >= 3 and c='N' then output;
end;
drop k;
run;
Tourmaline | Level 20

Re: How to get the observations for subsequent values based on a condition

``````
data have;
input A B C \$1.;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;

data want;
if _n_=1 then do;
dcl hash H ();
h.definekey  ("_n_");
h.definedata ("_s",'_e');
h.definedone ();
dcl hiter hi('h');
end;
do _n_=1 by 1 until(last.a);
set have;
by a;
if c='N' then	do;
_count+1;
if _count=1 then _s=B;
_e=b;
end;
else do;
if _count>=3 then do;
end;
_count=0;
end;
end;
c='N';
do while(hi.next()=0);
do b=_s to _e;
output;
end;
end;
h.clear();
drop _:;
run;``````
Barite | Level 11

Re: How to get the observations for subsequent values based on a condition

``````Data have;
input A B C \$;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;

data want;
cnt=0;
do until(last.c);
set have;
by c notsorted;
cnt+1;
end;

do until(last.c);
set have;
by c notsorted;
if cnt >=3 then output;
end;

drop cnt;
run;``````
Super User

Re: How to get the observations for subsequent values based on a condition

``````Data have;
input A B C \$;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
set have;
by a c notsorted;
group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;``````
Ammonite | Level 13

Re: How to get the observations for subsequent values based on a condition

You've got plenty of good offers, of which I'd prefer those based on the double DoW-loop and grouping by A C.

Another approach can be:

While passing through the file, memorize the observation number endpoints of each consecutive "N" sequence

At the break of the sequence (i.e. when C="N" or LAST.A), if the distance between the endpoints GE 2, use them to output the needed records.

Note that below, I've included an extra BY-group (A=2) to test the condition when a qualifying "N" sequence ends at the end of a BY A group.

``````data have ;
input (A B C) (:\$1.) ;
cards ;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 0 N
2 1 N
2 2 Y
2 3 N
2 4 N
2 5 N
2 6 Y
2 7 N
2 8 N
2 9 N
3 1 Y
3 2 N
3 3 N
3 4 N
3 5 N
3 6 N
3 7 N
3 8 Y
3 8 Y
;
run ;

data want ;
set have ;
by a ;
if c eq "N" and not _iorc_ then _iorc_ = _n_ ;
if (c ne "N" or last.a) and _iorc_ ;
if _n_ - not last.a - _iorc_ => 2 then do p = _iorc_ to _n_ - not last.a ;
set have point = p ;
output ;
end ;
_iorc_ = 0 ;
run ;
``````

_IORC_ (auto-retained, auto-dropped, and set to 0 at compile) tracks the record numbers where the "N" sequences begin; _N_ tracks the record numbers where they terminate. This approach can be advantageous performance-wise if the number and size of the qualifying "N" sequences are small compared to the size of the file, as it thus avoids a full second read.

Kind regards

Paul D.

Discussion stats
• 5 replies
• 528 views
• 2 likes
• 6 in conversation