Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- Subsetting in a group of values in sas

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-09-2022 07:50 AM
(1153 views)

Hello members,

I am trying to filter out multiple rows from a group of observation where it matches to my need. So i have a dataset where each id will have 6 observations with different codes and the observations i need has specific codes for example A1 and B1 but from these codes also i need observations which are populated after any other code from the above two. Below is the quick snapshot of the dummy data and i need all the observations highlighted in green. Please let me know if i am unable to post my question properly or more details required.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Do you want something like this:

```
data want;
set have;
by id;
if first.id then out = 1; retain out; drop out;
if code in ('Z1' 'X1') then out = 0;
if out then output;
run;
```

?

Bart

_______________

**Polish SAS Users Group**: www.polsug.com and communities.sas.com/polsug

"**SAS Packages: the way to share**" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.

Hands-on-Workshop: "**Share your code with SAS Packages**"

"**My First SAS Package: A How-To**" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three

SAS Documentation

"

Hands-on-Workshop: "

"

SAS Ballot Ideas: one: SPF in SAS, two, and three

SAS Documentation

11 REPLIES 11

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

i was able to create a flag and counter (cnt) for the observations but i need only the ones in green.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'm not quite understanding your question. When I look at this, you could just filter where code = `B1` to get what you need. But I'm thinking that there may be additional patterns that are not shown in your example data.

Additionally, it helps us to have a reproducible example in the following format. A screenshot isn't helpful for us to resolve your issue, and we can often make mistakes trying to replicate what you have posted in a screenshot.

```
data have;
infile datalines delimiter = ' ';
input id date :date9. code :$2. flag cnt;
format date date9.;
datalines;
1 30NOV2021 B1 1 1
1 31OCT2021 X1 0 0
1 30SEP2021 Z1 0 0
1 31AUG2021 A1 1 1
1 31JUL2021 A1 1 2
1 30JUN2021 A1 1 3
2 30NOV2021 B1 1 1
2 31OCT2021 B1 1 2
2 30SEP2021 B1 1 3
2 31AUG2021 Z1 0 0
2 31JUL2021 A1 1 1
2 30JUN2021 A1 1 2
3 30NOV2021 B1 1 1
3 31OCT2021 B1 1 2
3 30SEP2021 Z1 0 0
3 31AUG2021 Z1 0 0
3 31JUL2021 A1 1 1
3 30JUN2021 A1 1 2
;
run;
```

You can also use this to make a DATALINES statement from your actual data.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thankyou for your reply @maguiremq .Yes the additional patterns can be between (A1 to A7) and (B1 to B7) and i need all of them except if they break in between for example in ID = 1 it had a different code (Z1 & X1) in Sep and Oct in between and i dont need the observations prior to that even if its one of the codes i am looking for. So i cant just simply filter out on 'B1' as a group can have all A1 or B1 codes and that ways i will have extra observations not required. Please see the code i have added 2 more ids to explain.

data have;

infile datalines delimiter = ' ';

input id date :date9. code :$2. flag cnt;

format date date9.;

datalines;

1 30NOV2021 B1 1 1

1 31OCT2021 X1 0 0

1 30SEP2021 Z1 0 0

1 31AUG2021 A1 1 1

1 31JUL2021 A1 1 2

1 30JUN2021 A1 1 3

2 30NOV2021 B1 1 1

2 31OCT2021 B1 1 2

2 30SEP2021 B1 1 3

2 31AUG2021 Z1 0 0

2 31JUL2021 A1 1 1

2 30JUN2021 A1 1 2

3 30NOV2021 B1 1 1

3 31OCT2021 B1 1 2

3 30SEP2021 Z1 0 0

3 31AUG2021 Z1 0 0

3 31JUL2021 A1 1 1

3 30JUN2021 A1 1 2

4 30NOV2021 B1 1 1

4 31OCT2021 B1 1 2

4 30SEP2021 B1 1 3

4 31AUG2021 B1 1 4

4 31JUL2021 B1 1 5

4 30JUN2021 B1 1 6

5 30NOV2021 A3 1 1

5 31OCT2021 A2 1 2

5 30SEP2021 A1 1 3

5 31AUG2021 A1 1 4

5 31JUL2021 A1 1 5

5 30JUN2021 A1 1 6

;

run;

If i just filter on the required codes will give extra observations and i need only the ones highlighted.

id | date | code | flag | cnt |

1 |
30Nov2021 |
B1 |
1 |
1 |

1 | 31Aug2021 | A1 | 1 | 1 |

1 | 31Jul2021 | A1 | 1 | 2 |

1 | 30Jun2021 | A1 | 1 | 3 |

2 |
30Nov2021 |
B1 |
1 |
1 |

2 |
31Oct2021 |
B1 |
1 |
2 |

2 |
30Sep2021 |
B1 |
1 |
3 |

2 | 31Jul2021 | A1 | 1 | 1 |

2 | 30Jun2021 | A1 | 1 | 2 |

3 |
30Nov2021 |
B1 |
1 |
1 |

3 |
31Oct2021 |
B1 |
1 |
2 |

3 | 31Jul2021 | A1 | 1 | 1 |

3 | 30Jun2021 | A1 | 1 | 2 |

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Do you want something like this:

```
data want;
set have;
by id;
if first.id then out = 1; retain out; drop out;
if code in ('Z1' 'X1') then out = 0;
if out then output;
run;
```

?

Bart

_______________

**Polish SAS Users Group**: www.polsug.com and communities.sas.com/polsug

"**SAS Packages: the way to share**" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.

Hands-on-Workshop: "**Share your code with SAS Packages**"

"**My First SAS Package: A How-To**" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three

SAS Documentation

"

Hands-on-Workshop: "

"

SAS Ballot Ideas: one: SPF in SAS, two, and three

SAS Documentation

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I do not understand the explanation given as to why your desired result is the green lines. Can you provide more detail on how the green lines are chosen? In addition, you mention A1, but none of the A1 lines are in green, can you also explain that part. Thanks.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thankyou for your reply @PaigeMiller . i have tried to add more explanation to the above.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

... and i dont need the observations prior to that even if its one of the codes i am looking for ...

"Prior" in this case means chronologically? So September is prior to October? "Prior" is not being used to indicate prior row in the data set?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Yes in the chronologically order. Meaning if i have any code other than A1-A7 and B1-B7 in between any 6 month cycle. Then only keep the observations per group for the latest months .

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Okay, I see that you mentioned months in here, and that would change things drastically. Accordingly, this solution may not work if you need this sorted by month. This seems to fit your parameters, but I see it as a bit clunky. I don't really like mixing DATA steps and PROC SQL, but sometimes they work well together.

```
data have;
infile datalines delimiter = ' ';
input id date :date9. code :$2. flag cnt;
format date date9.;
datalines;
1 30NOV2021 B1 1 1
1 31OCT2021 X1 0 0
1 30SEP2021 Z1 0 0
1 31AUG2021 A1 1 1
1 31JUL2021 A1 1 2
1 30JUN2021 A1 1 3
2 30NOV2021 B1 1 1
2 31OCT2021 B1 1 2
2 30SEP2021 B1 1 3
2 31AUG2021 Z1 0 0
2 31JUL2021 A1 1 1
2 30JUN2021 A1 1 2
3 30NOV2021 B1 1 1
3 31OCT2021 B1 1 2
3 30SEP2021 Z1 0 0
3 31AUG2021 Z1 0 0
3 31JUL2021 A1 1 1
3 30JUN2021 A1 1 2
4 30NOV2021 B1 1 1
4 31OCT2021 B1 1 2
4 30SEP2021 B1 1 3
4 31AUG2021 B1 1 4
4 31JUL2021 B1 1 5
4 30JUN2021 B1 1 6
5 30NOV2021 A3 1 1
5 31OCT2021 A2 1 2
5 30SEP2021 A1 1 3
5 31AUG2021 A1 1 4
5 31JUL2021 A1 1 5
5 30JUN2021 A1 1 6
;
run;
data have_subset;
set have (keep = id date code);
by id;
if first.id then seq = 1;
if id = lag(id) and code ~= lag(code) then seq + 1;
if code in ('A1', 'B1') then flag = 1;
else flag = 0;
run;
proc sql;
create table want as
select
id,
date,
code
from
have_subset
where
flag = 1
group by
id
having
seq = min(seq);
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thankyou @maguiremq its almost there and i am getting the results as expected with one last thing that in your answer if the code in ('A1','B1') but here i am looking for all the codes between A1-A7 and B1-B7. Since the code is doing seq increment on the basis of previous code not matching. So if i replace the code with below snippet i am only getting one record for the id = 5 whereas it should have all 6 records since the code for that id has A1-A3 which can be upto A7. Apologies if I haven't explained it earlier.

```
if code in ('A1','A2','A3','A4','A5','A6','A7','B1','B2','B3','B4','B5','B6','B7')
then flag = 1;
```

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Ready to level-up your skills? Choose your own adventure.