Hello members,
I am trying to filter out multiple rows from a group of observation where it matches to my need. So i have a dataset where each id will have 6 observations with different codes and the observations i need has specific codes for example A1 and B1 but from these codes also i need observations which are populated after any other code from the above two. Below is the quick snapshot of the dummy data and i need all the observations highlighted in green. Please let me know if i am unable to post my question properly or more details required.
Do you want something like this:
data want;
set have;
by id;
if first.id then out = 1; retain out; drop out;
if code in ('Z1' 'X1') then out = 0;
if out then output;
run;
?
Bart
I'm not quite understanding your question. When I look at this, you could just filter where code = `B1` to get what you need. But I'm thinking that there may be additional patterns that are not shown in your example data.
Additionally, it helps us to have a reproducible example in the following format. A screenshot isn't helpful for us to resolve your issue, and we can often make mistakes trying to replicate what you have posted in a screenshot.
data have;
infile datalines delimiter = ' ';
input id date :date9. code :$2. flag cnt;
format date date9.;
datalines;
1 30NOV2021 B1 1 1
1 31OCT2021 X1 0 0
1 30SEP2021 Z1 0 0
1 31AUG2021 A1 1 1
1 31JUL2021 A1 1 2
1 30JUN2021 A1 1 3
2 30NOV2021 B1 1 1
2 31OCT2021 B1 1 2
2 30SEP2021 B1 1 3
2 31AUG2021 Z1 0 0
2 31JUL2021 A1 1 1
2 30JUN2021 A1 1 2
3 30NOV2021 B1 1 1
3 31OCT2021 B1 1 2
3 30SEP2021 Z1 0 0
3 31AUG2021 Z1 0 0
3 31JUL2021 A1 1 1
3 30JUN2021 A1 1 2
;
run;
You can also use this to make a DATALINES statement from your actual data.
Thankyou for your reply @maguiremq .Yes the additional patterns can be between (A1 to A7) and (B1 to B7) and i need all of them except if they break in between for example in ID = 1 it had a different code (Z1 & X1) in Sep and Oct in between and i dont need the observations prior to that even if its one of the codes i am looking for. So i cant just simply filter out on 'B1' as a group can have all A1 or B1 codes and that ways i will have extra observations not required. Please see the code i have added 2 more ids to explain.
data have;
infile datalines delimiter = ' ';
input id date :date9. code :$2. flag cnt;
format date date9.;
datalines;
1 30NOV2021 B1 1 1
1 31OCT2021 X1 0 0
1 30SEP2021 Z1 0 0
1 31AUG2021 A1 1 1
1 31JUL2021 A1 1 2
1 30JUN2021 A1 1 3
2 30NOV2021 B1 1 1
2 31OCT2021 B1 1 2
2 30SEP2021 B1 1 3
2 31AUG2021 Z1 0 0
2 31JUL2021 A1 1 1
2 30JUN2021 A1 1 2
3 30NOV2021 B1 1 1
3 31OCT2021 B1 1 2
3 30SEP2021 Z1 0 0
3 31AUG2021 Z1 0 0
3 31JUL2021 A1 1 1
3 30JUN2021 A1 1 2
4 30NOV2021 B1 1 1
4 31OCT2021 B1 1 2
4 30SEP2021 B1 1 3
4 31AUG2021 B1 1 4
4 31JUL2021 B1 1 5
4 30JUN2021 B1 1 6
5 30NOV2021 A3 1 1
5 31OCT2021 A2 1 2
5 30SEP2021 A1 1 3
5 31AUG2021 A1 1 4
5 31JUL2021 A1 1 5
5 30JUN2021 A1 1 6
;
run;
If i just filter on the required codes will give extra observations and i need only the ones highlighted.
id | date | code | flag | cnt |
1 | 30Nov2021 | B1 | 1 | 1 |
1 | 31Aug2021 | A1 | 1 | 1 |
1 | 31Jul2021 | A1 | 1 | 2 |
1 | 30Jun2021 | A1 | 1 | 3 |
2 | 30Nov2021 | B1 | 1 | 1 |
2 | 31Oct2021 | B1 | 1 | 2 |
2 | 30Sep2021 | B1 | 1 | 3 |
2 | 31Jul2021 | A1 | 1 | 1 |
2 | 30Jun2021 | A1 | 1 | 2 |
3 | 30Nov2021 | B1 | 1 | 1 |
3 | 31Oct2021 | B1 | 1 | 2 |
3 | 31Jul2021 | A1 | 1 | 1 |
3 | 30Jun2021 | A1 | 1 | 2 |
Do you want something like this:
data want;
set have;
by id;
if first.id then out = 1; retain out; drop out;
if code in ('Z1' 'X1') then out = 0;
if out then output;
run;
?
Bart
Hi @yabwon Thankyou for your reply. Yes its perfect and that's what i was looking for. Really appreciate.
I do not understand the explanation given as to why your desired result is the green lines. Can you provide more detail on how the green lines are chosen? In addition, you mention A1, but none of the A1 lines are in green, can you also explain that part. Thanks.
Thankyou for your reply @PaigeMiller . i have tried to add more explanation to the above.
... and i dont need the observations prior to that even if its one of the codes i am looking for ...
"Prior" in this case means chronologically? So September is prior to October? "Prior" is not being used to indicate prior row in the data set?
Okay, I see that you mentioned months in here, and that would change things drastically. Accordingly, this solution may not work if you need this sorted by month. This seems to fit your parameters, but I see it as a bit clunky. I don't really like mixing DATA steps and PROC SQL, but sometimes they work well together.
data have;
infile datalines delimiter = ' ';
input id date :date9. code :$2. flag cnt;
format date date9.;
datalines;
1 30NOV2021 B1 1 1
1 31OCT2021 X1 0 0
1 30SEP2021 Z1 0 0
1 31AUG2021 A1 1 1
1 31JUL2021 A1 1 2
1 30JUN2021 A1 1 3
2 30NOV2021 B1 1 1
2 31OCT2021 B1 1 2
2 30SEP2021 B1 1 3
2 31AUG2021 Z1 0 0
2 31JUL2021 A1 1 1
2 30JUN2021 A1 1 2
3 30NOV2021 B1 1 1
3 31OCT2021 B1 1 2
3 30SEP2021 Z1 0 0
3 31AUG2021 Z1 0 0
3 31JUL2021 A1 1 1
3 30JUN2021 A1 1 2
4 30NOV2021 B1 1 1
4 31OCT2021 B1 1 2
4 30SEP2021 B1 1 3
4 31AUG2021 B1 1 4
4 31JUL2021 B1 1 5
4 30JUN2021 B1 1 6
5 30NOV2021 A3 1 1
5 31OCT2021 A2 1 2
5 30SEP2021 A1 1 3
5 31AUG2021 A1 1 4
5 31JUL2021 A1 1 5
5 30JUN2021 A1 1 6
;
run;
data have_subset;
set have (keep = id date code);
by id;
if first.id then seq = 1;
if id = lag(id) and code ~= lag(code) then seq + 1;
if code in ('A1', 'B1') then flag = 1;
else flag = 0;
run;
proc sql;
create table want as
select
id,
date,
code
from
have_subset
where
flag = 1
group by
id
having
seq = min(seq);
quit;
Thankyou @maguiremq its almost there and i am getting the results as expected with one last thing that in your answer if the code in ('A1','B1') but here i am looking for all the codes between A1-A7 and B1-B7. Since the code is doing seq increment on the basis of previous code not matching. So if i replace the code with below snippet i am only getting one record for the id = 5 whereas it should have all 6 records since the code for that id has A1-A3 which can be upto A7. Apologies if I haven't explained it earlier.
if code in ('A1','A2','A3','A4','A5','A6','A7','B1','B2','B3','B4','B5','B6','B7')
then flag = 1;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.