BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jksthomas
Calcite | Level 5

Can anyone help how to get the observations based on the below condition:

 

If 3 consecutive "N" is found in column C for a same value in A column 

 

Eg: A1B4 A1B5 and A1B6 has 3 consecutive N's and I need all three in the output

 

A B C
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y

 

Expected output:

A B C
1 4 N
1 5 N
1 6 N
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
 set have;
 by a c notsorted;
 group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;

View solution in original post

5 REPLIES 5
Astounding
PROC Star
One approach:

data want;
k=0;
do until (last.c) ;
set have;
by a c notsorted;
k + 1;
end;
do until (last.c) ;
set have;
by a c notsorted;
if k >= 3 and c='N' then output;
end;
drop k;
run;
novinosrin
Tourmaline | Level 20

Hi @jksthomas 

 


data have;
input A B C $1.;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;


data want;
 if _n_=1 then do;
 dcl hash H ();
 h.definekey  ("_n_");
 h.definedata ("_s",'_e');
 h.definedone ();
 dcl hiter hi('h');
 end;
 do _n_=1 by 1 until(last.a);
  set have;
  by a;
  if c='N' then	do;
   _count+1;
   if _count=1 then _s=B;
   _e=b;
  end;
  else do;
   if _count>=3 then do;
   h.add();
   end;
   _count=0;
  end;
 end;
 c='N';
 do while(hi.next()=0);
  do b=_s to _e;
   output;
  end;
 end;
 h.clear();
 drop _:;
run;
r_behata
Barite | Level 11
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;

data want;
	cnt=0;
	do until(last.c);
		set have;
		by c notsorted;
		cnt+1;
	end;

	do until(last.c);
		set have;
		by c notsorted;
		if cnt >=3 then output;
	end;

	drop cnt;
run;
Ksharp
Super User
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
 set have;
 by a c notsorted;
 group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;
hashman
Ammonite | Level 13

@jksthomas:

You've got plenty of good offers, of which I'd prefer those based on the double DoW-loop and grouping by A C.

Another approach can be:

While passing through the file, memorize the observation number endpoints of each consecutive "N" sequence

At the break of the sequence (i.e. when C="N" or LAST.A), if the distance between the endpoints GE 2, use them to output the needed records.  

Note that below, I've included an extra BY-group (A=2) to test the condition when a qualifying "N" sequence ends at the end of a BY A group. 

data have ;                                                                                                                             
  input (A B C) (:$1.) ;                                                                                                                
  cards ;                                                                                                                               
1 1 N                                                                                                                                   
1 2 N                                                                                                                                   
1 3 Y                                                                                                                                   
1 4 N                                                                                                                                   
1 5 N                                                                                                                                   
1 6 N                                                                                                                                   
1 7 Y                                                                                                                                   
1 8 Y                                                                                                                                   
1 9 N                                                                                                                                   
2 0 N                                                                                                                                   
2 1 N                                                                                                                                   
2 2 Y                                                                                                                                   
2 3 N                                                                                                                                   
2 4 N                                                                                                                                   
2 5 N                                                                                                                                   
2 6 Y                                                                                                                                   
2 7 N                                                                                                                                   
2 8 N                                                                                                                                   
2 9 N                                                                                                                                   
3 1 Y                                                                                                                                   
3 2 N                                                                                                                                   
3 3 N                                                                                                                                   
3 4 N                                                                                                                                   
3 5 N                                                                                                                                   
3 6 N                                                                                                                                   
3 7 N                                                                                                                                   
3 8 Y                                                                                                                                   
3 8 Y                                                                                                                                   
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want ;                                                                                                                             
  set have ;                                                                                                                            
  by a ;                                                                                                                                
  if c eq "N" and not _iorc_ then _iorc_ = _n_ ;                                                                                        
  if (c ne "N" or last.a) and _iorc_ ;                                                                                                  
  if _n_ - not last.a - _iorc_ => 2 then do p = _iorc_ to _n_ - not last.a ;                                                            
    set have point = p ;                                                                                                                
    output ;                                                                                                                            
  end ;                                                                                                                                 
  _iorc_ = 0 ;                                                                                                                          
run ;                                      

_IORC_ (auto-retained, auto-dropped, and set to 0 at compile) tracks the record numbers where the "N" sequences begin; _N_ tracks the record numbers where they terminate. This approach can be advantageous performance-wise if the number and size of the qualifying "N" sequences are small compared to the size of the file, as it thus avoids a full second read.

 

Kind regards

Paul D. 

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1050 views
  • 2 likes
  • 6 in conversation