BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jksthomas
Calcite | Level 5

Can anyone help how to get the observations based on the below condition:

 

If 3 consecutive "N" is found in column C for a same value in A column 

 

Eg: A1B4 A1B5 and A1B6 has 3 consecutive N's and I need all three in the output

 

A B C
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y

 

Expected output:

A B C
1 4 N
1 5 N
1 6 N
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
 set have;
 by a c notsorted;
 group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;

View solution in original post

5 REPLIES 5
Astounding
PROC Star
One approach:

data want;
k=0;
do until (last.c) ;
set have;
by a c notsorted;
k + 1;
end;
do until (last.c) ;
set have;
by a c notsorted;
if k >= 3 and c='N' then output;
end;
drop k;
run;
novinosrin
Tourmaline | Level 20

Hi @jksthomas 

 


data have;
input A B C $1.;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;


data want;
 if _n_=1 then do;
 dcl hash H ();
 h.definekey  ("_n_");
 h.definedata ("_s",'_e');
 h.definedone ();
 dcl hiter hi('h');
 end;
 do _n_=1 by 1 until(last.a);
  set have;
  by a;
  if c='N' then	do;
   _count+1;
   if _count=1 then _s=B;
   _e=b;
  end;
  else do;
   if _count>=3 then do;
   h.add();
   end;
   _count=0;
  end;
 end;
 c='N';
 do while(hi.next()=0);
  do b=_s to _e;
   output;
  end;
 end;
 h.clear();
 drop _:;
run;
r_behata
Barite | Level 11
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;

data want;
	cnt=0;
	do until(last.c);
		set have;
		by c notsorted;
		cnt+1;
	end;

	do until(last.c);
		set have;
		by c notsorted;
		if cnt >=3 then output;
	end;

	drop cnt;
run;
Ksharp
Super User
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
 set have;
 by a c notsorted;
 group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;
hashman
Ammonite | Level 13

@jksthomas:

You've got plenty of good offers, of which I'd prefer those based on the double DoW-loop and grouping by A C.

Another approach can be:

While passing through the file, memorize the observation number endpoints of each consecutive "N" sequence

At the break of the sequence (i.e. when C="N" or LAST.A), if the distance between the endpoints GE 2, use them to output the needed records.  

Note that below, I've included an extra BY-group (A=2) to test the condition when a qualifying "N" sequence ends at the end of a BY A group. 

data have ;                                                                                                                             
  input (A B C) (:$1.) ;                                                                                                                
  cards ;                                                                                                                               
1 1 N                                                                                                                                   
1 2 N                                                                                                                                   
1 3 Y                                                                                                                                   
1 4 N                                                                                                                                   
1 5 N                                                                                                                                   
1 6 N                                                                                                                                   
1 7 Y                                                                                                                                   
1 8 Y                                                                                                                                   
1 9 N                                                                                                                                   
2 0 N                                                                                                                                   
2 1 N                                                                                                                                   
2 2 Y                                                                                                                                   
2 3 N                                                                                                                                   
2 4 N                                                                                                                                   
2 5 N                                                                                                                                   
2 6 Y                                                                                                                                   
2 7 N                                                                                                                                   
2 8 N                                                                                                                                   
2 9 N                                                                                                                                   
3 1 Y                                                                                                                                   
3 2 N                                                                                                                                   
3 3 N                                                                                                                                   
3 4 N                                                                                                                                   
3 5 N                                                                                                                                   
3 6 N                                                                                                                                   
3 7 N                                                                                                                                   
3 8 Y                                                                                                                                   
3 8 Y                                                                                                                                   
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want ;                                                                                                                             
  set have ;                                                                                                                            
  by a ;                                                                                                                                
  if c eq "N" and not _iorc_ then _iorc_ = _n_ ;                                                                                        
  if (c ne "N" or last.a) and _iorc_ ;                                                                                                  
  if _n_ - not last.a - _iorc_ => 2 then do p = _iorc_ to _n_ - not last.a ;                                                            
    set have point = p ;                                                                                                                
    output ;                                                                                                                            
  end ;                                                                                                                                 
  _iorc_ = 0 ;                                                                                                                          
run ;                                      

_IORC_ (auto-retained, auto-dropped, and set to 0 at compile) tracks the record numbers where the "N" sequences begin; _N_ tracks the record numbers where they terminate. This approach can be advantageous performance-wise if the number and size of the qualifying "N" sequences are small compared to the size of the file, as it thus avoids a full second read.

 

Kind regards

Paul D. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1568 views
  • 2 likes
  • 6 in conversation