BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sanflo
Fluorite | Level 6

Hi everyone,

 

I have a large set of data that I require sub setting based on specific conditions being met. These conditions are keeping all records between (and including) the first time each UID enters STATUS "A" until the first time the same UID enters STATUS "D". If a record has never reached record D then it would not be included in the data.

 

Below I have included a sample of my data in the HAVE table with what the desired output would look like in WANT.

 

Many thanks in advance,

 

Sandy

 

DATA WORK.HAVE;
 INPUT UID $ STATUS $ DATE: DATE9.;
 FORMAT DATE DATE9.;
 INFILE DATALINES DLM=',';
 DATALINES;
1,A,01NOV2018
1,A,01NOV2018
1,A,03NOV2018
1,B,04NOV2018
1,C,05NOV2018
1,D,06NOV2018
1,E,07NOV2018
2,A,01NOV2018
2,B,03NOV2018
2,C,04NOV2018
2,D,05NOV2018
2,A,05NOV2018
2,D,06NOV2018
3,A,01NOV2018
3,B,02NOV2018
3,C,03NOV2018
;
RUN;

DATA WORK.WANT;
 INPUT UID $ STATUS $ DATE: DATE9.;
 FORMAT DATE DATE9.;
 INFILE DATALINES DLM=',';
 DATALINES;
1,A,01NOV2018
1,A,01NOV2018
1,A,03NOV2018
1,B,04NOV2018
1,C,05NOV2018
1,D,06NOV2018
2,A,01NOV2018
2,B,03NOV2018
2,C,04NOV2018
2,D,05NOV2018
;
RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
DATA WORK.HAVE;
 INPUT UID $ STATUS $ DATE: DATE9.;
 FORMAT DATE DATE9.;
 INFILE DATALINES DLM=',';
 DATALINES;
1,A,01NOV2018
1,A,01NOV2018
1,A,03NOV2018
1,B,04NOV2018
1,C,05NOV2018
1,D,06NOV2018
1,E,07NOV2018
2,A,01NOV2018
2,B,03NOV2018
2,C,04NOV2018
2,D,05NOV2018
2,A,05NOV2018
2,D,06NOV2018
3,A,01NOV2018
3,B,02NOV2018
3,C,03NOV2018
;
RUN;

data want;
length _s $1;
do until(last.uid);
set have;
by uid STATUS notsorted;
if status='A' and not _t1 then _t1=date;
else if  not _t2 and status='D' then _t2=date;
end;
do until(last.uid);
set have;
by uid STATUS notsorted;
if n(of _t:)=2 and _t1<=date<=_t2 and missing(_s) then output;
if status='D' then _s=status;
end;
drop _:;
run;

View solution in original post

4 REPLIES 4
novinosrin
Tourmaline | Level 20

Hi @Sanflo First off, welcome to SAS communities. Secondly, Thank you & hats off to you for a clean post with neat samples. You deserve credit for that. My oh my, didn't expect that from a new member. 

 

Ok, a small clarification plz- 

1 Just like you have multiples of status=A in UID 1, can there be multiples of status=D for a by group?

2. If the above is yes, So the records to pick should be earliest A to latest D or earliest D?Quoting your line-the first time the same UID enters STATUS "D". is indeed clear but I still want to make sure. 

 

Also, in the output sample

2,D,05NOV2018
2,A,05NOV2018    will this not qualify as it falls on the same date as that of D? 

 

 

Thanks & welcome again

 

novinosrin
Tourmaline | Level 20
DATA WORK.HAVE;
 INPUT UID $ STATUS $ DATE: DATE9.;
 FORMAT DATE DATE9.;
 INFILE DATALINES DLM=',';
 DATALINES;
1,A,01NOV2018
1,A,01NOV2018
1,A,03NOV2018
1,B,04NOV2018
1,C,05NOV2018
1,D,06NOV2018
1,E,07NOV2018
2,A,01NOV2018
2,B,03NOV2018
2,C,04NOV2018
2,D,05NOV2018
2,A,05NOV2018
2,D,06NOV2018
3,A,01NOV2018
3,B,02NOV2018
3,C,03NOV2018
;
RUN;

data want;
length _s $1;
do until(last.uid);
set have;
by uid STATUS notsorted;
if status='A' and not _t1 then _t1=date;
else if  not _t2 and status='D' then _t2=date;
end;
do until(last.uid);
set have;
by uid STATUS notsorted;
if n(of _t:)=2 and _t1<=date<=_t2 and missing(_s) then output;
if status='D' then _s=status;
end;
drop _:;
run;
Sanflo
Fluorite | Level 6

Hi @novinosrin,

 

First, thanks for the kind words and answers to your questions from prior:

 

1. Yes there can be multiples of D for a by group. We want to stop when it finds the first instance of this (at least just now).

2. It would be the earliest A to the earliest D.

 

Next thank you so much for your solution provided. It worked perfectly in with both the sample data and also when I modified it to my live data. If you don't mind may I ask you another question?

 

- I realised once I ran it that my original requirement that keeping all records up to and including "D" isn't quite correct. I should find all records up to "D" but then not including "D" of the by group. I updated the below line of your code to change the 2nd "<=" to a "<". After testing this everything looks perfect in the data. Would you be able to confirm that this would be correct and perhaps explain how the code works a little?

 

 

Again much appreciated for the code and your time!

 

Sandy

 

IF N(OF _T:)=2 AND _T1<=DATEUPDATED<_T2 and MISSING(_S) THEN OUTPUT;
novinosrin
Tourmaline | Level 20

 

Hi @Sanflo  Well done, The change you did was neat and correct. 

 

Basically, The code is pretty straight forward as your sample is sorted by ascending date, uses two passes(executing a by group twice hence you see do until*twice).

On the first pass, we determine the first date for A and the first date for D. We hold these two dates in temporary variables that we would use for conditional check and output on the 2nd pass of the same by group and eventually drop them at the end.

The N(of _t: ) determines the number of non missing values as if one were to be missing, we are either missing A or D. Since we need both, it has be 2, so that's a check in place.

Then it's pretty much the simple between-and filter that does our need.

 

Now for your requirement's minor change(up-to to D but excluding D), the code further simplifies to:

data want;
do until(last.uid);
set have;
by uid STATUS notsorted;
if status='A' and not _t1 then _t1=date;
else if  not _t2 and status='D' then _t2=date;
end;
do until(last.uid);
set have;
by uid STATUS notsorted;
if n(of _t:)=2 and _t1<=date<_t2  then output;
end;
drop _:;
run;

for the reason that we are not concerned about the D date anymore which may have another status like A falling on the same date. Nonetheless, please do test thoroughly and feel free to come back to us if you need more help.

 

HTH & Regards!

 

 

 


@Sanflo wrote:

Hi @novinosrin,

 

First, thanks for the kind words and answers to your questions from prior:

 

1. Yes there can be multiples of D for a by group. We want to stop when it finds the first instance of this (at least just now).

2. It would be the earliest A to the earliest D.

 

Next thank you so much for your solution provided. It worked perfectly in with both the sample data and also when I modified it to my live data. If you don't mind may I ask you another question?

 

- I realised once I ran it that my original requirement that keeping all records up to and including "D" isn't quite correct. I should find all records up to "D" but then not including "D" of the by group. I updated the below line of your code to change the 2nd "<=" to a "<". After testing this everything looks perfect in the data. Would you be able to confirm that this would be correct and perhaps explain how the code works a little?

 

 

Again much appreciated for the code and your time!

 

Sandy

 

IF N(OF _T:)=2 AND _T1<=DATEUPDATED<_T2 and MISSING(_S) THEN OUTPUT;

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1092 views
  • 3 likes
  • 2 in conversation