Can anyone help how to get the observations based on the below condition:
If 3 consecutive "N" is found in column C for a same value in A column
Eg: A1B4 A1B5 and A1B6 has 3 consecutive N's and I need all three in the output
A B C
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
Expected output:
A B C
1 4 N
1 5 N
1 6 N
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
set have;
by a c notsorted;
group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;
Hi @jksthomas
data have;
input A B C $1.;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
data want;
if _n_=1 then do;
dcl hash H ();
h.definekey ("_n_");
h.definedata ("_s",'_e');
h.definedone ();
dcl hiter hi('h');
end;
do _n_=1 by 1 until(last.a);
set have;
by a;
if c='N' then do;
_count+1;
if _count=1 then _s=B;
_e=b;
end;
else do;
if _count>=3 then do;
h.add();
end;
_count=0;
end;
end;
c='N';
do while(hi.next()=0);
do b=_s to _e;
output;
end;
end;
h.clear();
drop _:;
run;
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data want;
cnt=0;
do until(last.c);
set have;
by c notsorted;
cnt+1;
end;
do until(last.c);
set have;
by c notsorted;
if cnt >=3 then output;
end;
drop cnt;
run;
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
set have;
by a c notsorted;
group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;
You've got plenty of good offers, of which I'd prefer those based on the double DoW-loop and grouping by A C.
Another approach can be:
While passing through the file, memorize the observation number endpoints of each consecutive "N" sequence
At the break of the sequence (i.e. when C="N" or LAST.A), if the distance between the endpoints GE 2, use them to output the needed records.
Note that below, I've included an extra BY-group (A=2) to test the condition when a qualifying "N" sequence ends at the end of a BY A group.
data have ;
input (A B C) (:$1.) ;
cards ;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 0 N
2 1 N
2 2 Y
2 3 N
2 4 N
2 5 N
2 6 Y
2 7 N
2 8 N
2 9 N
3 1 Y
3 2 N
3 3 N
3 4 N
3 5 N
3 6 N
3 7 N
3 8 Y
3 8 Y
;
run ;
data want ;
set have ;
by a ;
if c eq "N" and not _iorc_ then _iorc_ = _n_ ;
if (c ne "N" or last.a) and _iorc_ ;
if _n_ - not last.a - _iorc_ => 2 then do p = _iorc_ to _n_ - not last.a ;
set have point = p ;
output ;
end ;
_iorc_ = 0 ;
run ;
_IORC_ (auto-retained, auto-dropped, and set to 0 at compile) tracks the record numbers where the "N" sequences begin; _N_ tracks the record numbers where they terminate. This approach can be advantageous performance-wise if the number and size of the qualifying "N" sequences are small compared to the size of the file, as it thus avoids a full second read.
Kind regards
Paul D.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.