Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- How to get the observations for subsequent values based on a condition

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-08-2019 09:28 AM
(475 views)

Can anyone help how to get the observations based on the below condition:

If 3 consecutive "N" is found in column C for a same value in A column

Eg: A1B4 A1B5 and A1B6 has 3 consecutive N's and I need all three in the output

```
A B C
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
```

Expected output:

```
A B C
1 4 N
1 5 N
1 6 N
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
```

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

```
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
set have;
by a c notsorted;
group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;
```

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

One approach:

data want;

k=0;

do until (last.c) ;

set have;

by a c notsorted;

k + 1;

end;

do until (last.c) ;

set have;

by a c notsorted;

if k >= 3 and c='N' then output;

end;

drop k;

run;

data want;

k=0;

do until (last.c) ;

set have;

by a c notsorted;

k + 1;

end;

do until (last.c) ;

set have;

by a c notsorted;

if k >= 3 and c='N' then output;

end;

drop k;

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi @jksthomas

```
data have;
input A B C $1.;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
data want;
if _n_=1 then do;
dcl hash H ();
h.definekey ("_n_");
h.definedata ("_s",'_e');
h.definedone ();
dcl hiter hi('h');
end;
do _n_=1 by 1 until(last.a);
set have;
by a;
if c='N' then do;
_count+1;
if _count=1 then _s=B;
_e=b;
end;
else do;
if _count>=3 then do;
h.add();
end;
_count=0;
end;
end;
c='N';
do while(hi.next()=0);
do b=_s to _e;
output;
end;
end;
h.clear();
drop _:;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

```
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data want;
cnt=0;
do until(last.c);
set have;
by c notsorted;
cnt+1;
end;
do until(last.c);
set have;
by c notsorted;
if cnt >=3 then output;
end;
drop cnt;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

```
Data have;
input A B C $;
cards;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 1 Y
2 2 N
2 3 N
2 4 N
2 5 N
2 6 N
2 7 N
2 8 Y
;
run;
data temp;
set have;
by a c notsorted;
group+first.c;
run;
proc sql;
create table want as
select * from temp group by group having count(*)>2 and sum(c='Y')=0;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You've got plenty of good offers, of which I'd prefer those based on the double DoW-loop and grouping by A C.

Another approach can be:

While passing through the file, memorize the observation number endpoints of each consecutive "N" sequence

At the break of the sequence (i.e. when C="N" or LAST.A), if the distance between the endpoints GE 2, use them to output the needed records.

Note that below, I've included an extra BY-group (A=2) to test the condition when a qualifying "N" sequence ends at the end of a BY A group.

```
data have ;
input (A B C) (:$1.) ;
cards ;
1 1 N
1 2 N
1 3 Y
1 4 N
1 5 N
1 6 N
1 7 Y
1 8 Y
1 9 N
2 0 N
2 1 N
2 2 Y
2 3 N
2 4 N
2 5 N
2 6 Y
2 7 N
2 8 N
2 9 N
3 1 Y
3 2 N
3 3 N
3 4 N
3 5 N
3 6 N
3 7 N
3 8 Y
3 8 Y
;
run ;
data want ;
set have ;
by a ;
if c eq "N" and not _iorc_ then _iorc_ = _n_ ;
if (c ne "N" or last.a) and _iorc_ ;
if _n_ - not last.a - _iorc_ => 2 then do p = _iorc_ to _n_ - not last.a ;
set have point = p ;
output ;
end ;
_iorc_ = 0 ;
run ;
```

_IORC_ (auto-retained, auto-dropped, and set to 0 at compile) tracks the record numbers where the "N" sequences begin; _N_ tracks the record numbers where they terminate. This approach can be advantageous performance-wise if the number and size of the qualifying "N" sequences are small compared to the size of the file, as it thus avoids a full second read.

Kind regards

Paul D.

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.