Solved: How to get the observations for subsequent values based on a condition

jksthomas · Posted 11-08-2019 09:28 AM

Can anyone help how to get the observations based on the below condition:

If 3 consecutive "N" is found in column C for a same value in A column

Eg: A1B4 A1B5 and A1B6 has 3 consecutive N's and I need all three in the output

Expected output:

A B C
1 4 N
1 5 N
1 6 N
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N

Ksharp · Posted 11-09-2019 07:35 AM

Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
 set have;
 by a c notsorted;
 group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;

View solution in original post

Astounding · Posted 11-08-2019 10:09 AM

One approach:

data want;
k=0;
do until (last.c) ;
set have;
by a c notsorted;
k + 1;
end;
do until (last.c) ;
set have;
by a c notsorted;
if k >= 3 and c='N' then output;
end;
drop k;
run;

novinosrin · Posted 11-08-2019 10:11 AM

Hi @jksthomas


data have;
input A B C $1.;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;


data want;
 if _n_=1 then do;
 dcl hash H ();
 h.definekey  ("_n_");
 h.definedata ("_s",'_e');
 h.definedone ();
 dcl hiter hi('h');
 end;
 do _n_=1 by 1 until(last.a);
  set have;
  by a;
  if c='N' then	do;
   _count+1;
   if _count=1 then _s=B;
   _e=b;
  end;
  else do;
   if _count>=3 then do;
   h.add();
   end;
   _count=0;
  end;
 end;
 c='N';
 do while(hi.next()=0);
  do b=_s to _e;
   output;
  end;
 end;
 h.clear();
 drop _:;
run;

r_behata · Posted 11-08-2019 10:23 AM

Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;

data want;
	cnt=0;
	do until(last.c);
		set have;
		by c notsorted;
		cnt+1;
	end;

	do until(last.c);
		set have;
		by c notsorted;
		if cnt >=3 then output;
	end;

	drop cnt;
run;

Ksharp · Posted 11-09-2019 07:35 AM

Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
 set have;
 by a c notsorted;
 group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;

hashman · Posted 11-10-2019 03:12 PM

@jksthomas:

You've got plenty of good offers, of which I'd prefer those based on the double DoW-loop and grouping by A C.

Another approach can be:

While passing through the file, memorize the observation number endpoints of each consecutive "N" sequence

At the break of the sequence (i.e. when C="N" or LAST.A), if the distance between the endpoints GE 2, use them to output the needed records.

Note that below, I've included an extra BY-group (A=2) to test the condition when a qualifying "N" sequence ends at the end of a BY A group.

data have ;                                                                                                                             
  input (A B C) (:$1.) ;                                                                                                                
  cards ;                                                                                                                               
1 1 N                                                                                                                                   
1 2 N                                                                                                                                   
1 3 Y                                                                                                                                   
1 4 N                                                                                                                                   
1 5 N                                                                                                                                   
1 6 N                                                                                                                                   
1 7 Y                                                                                                                                   
1 8 Y                                                                                                                                   
1 9 N                                                                                                                                   
2 0 N                                                                                                                                   
2 1 N                                                                                                                                   
2 2 Y                                                                                                                                   
2 3 N                                                                                                                                   
2 4 N                                                                                                                                   
2 5 N                                                                                                                                   
2 6 Y                                                                                                                                   
2 7 N                                                                                                                                   
2 8 N                                                                                                                                   
2 9 N                                                                                                                                   
3 1 Y                                                                                                                                   
3 2 N                                                                                                                                   
3 3 N                                                                                                                                   
3 4 N                                                                                                                                   
3 5 N                                                                                                                                   
3 6 N                                                                                                                                   
3 7 N                                                                                                                                   
3 8 Y                                                                                                                                   
3 8 Y                                                                                                                                   
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data want ;                                                                                                                             
  set have ;                                                                                                                            
  by a ;                                                                                                                                
  if c eq "N" and not _iorc_ then _iorc_ = _n_ ;                                                                                        
  if (c ne "N" or last.a) and _iorc_ ;                                                                                                  
  if _n_ - not last.a - _iorc_ => 2 then do p = _iorc_ to _n_ - not last.a ;                                                            
    set have point = p ;                                                                                                                
    output ;                                                                                                                            
  end ;                                                                                                                                 
  _iorc_ = 0 ;                                                                                                                          
run ;

_IORC_ (auto-retained, auto-dropped, and set to 0 at compile) tracks the record numbers where the "N" sequences begin; _N_ tracks the record numbers where they terminate. This approach can be advantageous performance-wise if the number and size of the qualifying "N" sequences are small compared to the size of the file, as it thus avoids a full second read.

Kind regards

Paul D.

How to get the observations for subsequent values based on a condition

Re: How to get the observations for subsequent values based on a condition

Re: How to get the observations for subsequent values based on a condition

Re: How to get the observations for subsequent values based on a condition

Re: How to get the observations for subsequent values based on a condition

Re: How to get the observations for subsequent values based on a condition

Re: How to get the observations for subsequent values based on a condition

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!