Can anyone help how to get the observations based on the below condition:
If 3 consecutive "N" is found in column C for a same value in A column
Eg: A1B4 A1B5 and A1B6 has 3 consecutive N's and I need all three in the output
A B C
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
Expected output:
A B C
1 4 N
1 5 N
1 6 N
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
set have;
by a c notsorted;
group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;
Hi @jksthomas
data have;
input A B C $1.;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
data want;
if _n_=1 then do;
dcl hash H ();
h.definekey ("_n_");
h.definedata ("_s",'_e');
h.definedone ();
dcl hiter hi('h');
end;
do _n_=1 by 1 until(last.a);
set have;
by a;
if c='N' then do;
_count+1;
if _count=1 then _s=B;
_e=b;
end;
else do;
if _count>=3 then do;
h.add();
end;
_count=0;
end;
end;
c='N';
do while(hi.next()=0);
do b=_s to _e;
output;
end;
end;
h.clear();
drop _:;
run;
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data want;
cnt=0;
do until(last.c);
set have;
by c notsorted;
cnt+1;
end;
do until(last.c);
set have;
by c notsorted;
if cnt >=3 then output;
end;
drop cnt;
run;
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
set have;
by a c notsorted;
group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;
You've got plenty of good offers, of which I'd prefer those based on the double DoW-loop and grouping by A C.
Another approach can be:
While passing through the file, memorize the observation number endpoints of each consecutive "N" sequence
At the break of the sequence (i.e. when C="N" or LAST.A), if the distance between the endpoints GE 2, use them to output the needed records.
Note that below, I've included an extra BY-group (A=2) to test the condition when a qualifying "N" sequence ends at the end of a BY A group.
data have ;
input (A B C) (:$1.) ;
cards ;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 0 N
2 1 N
2 2 Y
2 3 N
2 4 N
2 5 N
2 6 Y
2 7 N
2 8 N
2 9 N
3 1 Y
3 2 N
3 3 N
3 4 N
3 5 N
3 6 N
3 7 N
3 8 Y
3 8 Y
;
run ;
data want ;
set have ;
by a ;
if c eq "N" and not _iorc_ then _iorc_ = _n_ ;
if (c ne "N" or last.a) and _iorc_ ;
if _n_ - not last.a - _iorc_ => 2 then do p = _iorc_ to _n_ - not last.a ;
set have point = p ;
output ;
end ;
_iorc_ = 0 ;
run ;
_IORC_ (auto-retained, auto-dropped, and set to 0 at compile) tracks the record numbers where the "N" sequences begin; _N_ tracks the record numbers where they terminate. This approach can be advantageous performance-wise if the number and size of the qualifying "N" sequences are small compared to the size of the file, as it thus avoids a full second read.
Kind regards
Paul D.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.