BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sunboyss30
Obsidian | Level 7

Hello members,

 

I am trying to filter out multiple rows from a group of observation where it matches to my need. So i have a dataset where each id will have 6 observations with different codes and the observations i need has specific codes for example A1 and B1 but from these codes also i need observations which are populated after any other code from the above two. Below is the quick snapshot of the dummy data and i need all the observations highlighted in green. Please let me know if i am unable to post my question properly or more details required.

 

Sunboyss30_0-1646833707031.png

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Onyx | Level 15

Do you want something like this:

data want;
  set have;
  by id;

  if first.id then out = 1; retain out; drop out;
  if code in ('Z1' 'X1') then out = 0;
  if out then output;
run;

?

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

11 REPLIES 11
Sunboyss30
Obsidian | Level 7
i was able to create a flag and counter (cnt) for the observations but i need only the ones in green.

maguiremq
SAS Super FREQ

I'm not quite understanding your question. When I look at this, you could just filter where code = `B1` to get what you need. But I'm thinking that there may be additional patterns that are not shown in your example data.

 

Additionally, it helps us to have a reproducible example in the following format. A screenshot isn't helpful for us to resolve your issue, and we can often make mistakes trying to replicate what you have posted in a screenshot.

 

data have;
infile datalines delimiter = ' ';
input id date :date9. code :$2. flag cnt;
format date date9.;
datalines;
1 30NOV2021 B1 1 1
1 31OCT2021 X1 0 0
1 30SEP2021 Z1 0 0
1 31AUG2021 A1 1 1
1 31JUL2021 A1 1 2
1 30JUN2021 A1 1 3
2 30NOV2021 B1 1 1
2 31OCT2021 B1 1 2
2 30SEP2021 B1 1 3
2 31AUG2021 Z1 0 0
2 31JUL2021 A1 1 1
2 30JUN2021 A1 1 2
3 30NOV2021 B1 1 1
3 31OCT2021 B1 1 2
3 30SEP2021 Z1 0 0
3 31AUG2021 Z1 0 0
3 31JUL2021 A1 1 1
3 30JUN2021 A1 1 2
;
run;

You can also use this to make a DATALINES statement from your actual data.

Sunboyss30
Obsidian | Level 7

Thankyou for your reply @maguiremq .Yes the additional patterns can be between (A1 to A7) and (B1 to B7) and i need all of them except if they break in between for example in ID = 1 it had a different code (Z1 & X1) in Sep and Oct in between and i dont need the observations prior to that even if its one of the codes i am looking for. So i cant just simply filter out on 'B1' as a group can have all A1 or B1 codes and that ways i will have extra observations not required. Please see the code i have added 2 more ids to explain.



data have;
infile datalines delimiter = ' ';
input id date :date9. code :$2. flag cnt;
format date date9.;
datalines;
1 30NOV2021 B1 1 1
1 31OCT2021 X1 0 0
1 30SEP2021 Z1 0 0
1 31AUG2021 A1 1 1
1 31JUL2021 A1 1 2
1 30JUN2021 A1 1 3
2 30NOV2021 B1 1 1
2 31OCT2021 B1 1 2
2 30SEP2021 B1 1 3
2 31AUG2021 Z1 0 0
2 31JUL2021 A1 1 1
2 30JUN2021 A1 1 2
3 30NOV2021 B1 1 1
3 31OCT2021 B1 1 2
3 30SEP2021 Z1 0 0
3 31AUG2021 Z1 0 0
3 31JUL2021 A1 1 1
3 30JUN2021 A1 1 2
4 30NOV2021 B1 1 1
4 31OCT2021 B1 1 2
4 30SEP2021 B1 1 3
4 31AUG2021 B1 1 4
4 31JUL2021 B1 1 5
4 30JUN2021 B1 1 6
5 30NOV2021 A3 1 1
5 31OCT2021 A2 1 2
5 30SEP2021 A1 1 3
5 31AUG2021 A1 1 4
5 31JUL2021 A1 1 5
5 30JUN2021 A1 1 6
;
run;

 

If i just filter on the required codes will give extra observations and i need only the ones highlighted.

id date code flag cnt
1 30Nov2021 B1 1 1
1 31Aug2021 A1 1 1
1 31Jul2021 A1 1 2
1 30Jun2021 A1 1 3
2 30Nov2021 B1 1 1
2 31Oct2021 B1 1 2
2 30Sep2021 B1 1 3
2 31Jul2021 A1 1 1
2 30Jun2021 A1 1 2
3 30Nov2021 B1 1 1
3 31Oct2021 B1 1 2
3 31Jul2021 A1 1 1
3 30Jun2021 A1 1 2
yabwon
Onyx | Level 15

Do you want something like this:

data want;
  set have;
  by id;

  if first.id then out = 1; retain out; drop out;
  if code in ('Z1' 'X1') then out = 0;
  if out then output;
run;

?

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Sunboyss30
Obsidian | Level 7

Hi @yabwon Thankyou for your reply. Yes its perfect and that's what i was looking for. Really appreciate.

PaigeMiller
Diamond | Level 26

I do not understand the explanation given as to why your desired result is the green lines. Can you provide more detail on how the green lines are chosen? In addition, you mention A1, but none of the A1 lines are in green, can you also explain that part. Thanks.

--
Paige Miller
Sunboyss30
Obsidian | Level 7

Thankyou for your reply @PaigeMiller . i have tried to add more explanation to the above.

PaigeMiller
Diamond | Level 26

 ... and i dont need the observations prior to that even if its one of the codes i am looking for ...

 

"Prior" in this case means chronologically? So September is prior to October? "Prior" is not being used to indicate prior row in the data set?

 

--
Paige Miller
Sunboyss30
Obsidian | Level 7
Yes in the chronologically order. Meaning if i have any code other than A1-A7 and B1-B7 in between any 6 month cycle. Then only keep the observations per group for the latest months .
maguiremq
SAS Super FREQ

Okay, I see that you mentioned months in here, and that would change things drastically. Accordingly, this solution may not work if you need this sorted by month. This seems to fit your parameters, but I see it as a bit clunky. I don't really like mixing DATA steps and PROC SQL, but sometimes they work well together.

 

data have;
infile datalines delimiter = ' ';
input id date :date9. code :$2. flag cnt;
format date date9.;
datalines;
1 30NOV2021 B1 1 1
1 31OCT2021 X1 0 0
1 30SEP2021 Z1 0 0
1 31AUG2021 A1 1 1
1 31JUL2021 A1 1 2
1 30JUN2021 A1 1 3
2 30NOV2021 B1 1 1
2 31OCT2021 B1 1 2
2 30SEP2021 B1 1 3
2 31AUG2021 Z1 0 0
2 31JUL2021 A1 1 1
2 30JUN2021 A1 1 2
3 30NOV2021 B1 1 1
3 31OCT2021 B1 1 2
3 30SEP2021 Z1 0 0
3 31AUG2021 Z1 0 0
3 31JUL2021 A1 1 1
3 30JUN2021 A1 1 2
4 30NOV2021 B1 1 1
4 31OCT2021 B1 1 2
4 30SEP2021 B1 1 3
4 31AUG2021 B1 1 4
4 31JUL2021 B1 1 5
4 30JUN2021 B1 1 6
5 30NOV2021 A3 1 1
5 31OCT2021 A2 1 2
5 30SEP2021 A1 1 3
5 31AUG2021 A1 1 4
5 31JUL2021 A1 1 5
5 30JUN2021 A1 1 6
;
run;

data 	have_subset;
	set have (keep = id date code);
	by 	id;
		if first.id then seq = 1;
		if id = lag(id) and code ~= lag(code) then seq + 1;
		if code in ('A1', 'B1') then flag = 1;
			else flag = 0;
run;

proc sql;
	create table 	want as
		select
					id,
					date,
					code
		from
					have_subset
		where
					flag = 1
		group by
					id
		having
					seq = min(seq);
quit;

maguiremq_0-1646835637213.png

 

Sunboyss30
Obsidian | Level 7

Thankyou @maguiremq its almost there and i am getting the results as expected with one last thing that in your answer if the code in ('A1','B1') but here i am looking for all the codes between A1-A7 and B1-B7. Since the code is doing seq increment on the basis of previous code not matching. So if i replace the code with below snippet i am only getting one record for the id = 5 whereas it should have all 6 records since the code for that id has A1-A3 which can be upto A7. Apologies if I haven't explained it earlier.

if code in ('A1','A2','A3','A4','A5','A6','A7','B1','B2','B3','B4','B5','B6','B7') 
then flag = 1;

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1082 views
  • 2 likes
  • 4 in conversation