Solved: Re: Selecting all records between two dates with conditions

Sanflo · Posted 11-28-2018 02:22 PM

Hi everyone,

I have a large set of data that I require sub setting based on specific conditions being met. These conditions are keeping all records between (and including) the first time each UID enters STATUS "A" until the first time the same UID enters STATUS "D". If a record has never reached record D then it would not be included in the data.

Below I have included a sample of my data in the HAVE table with what the desired output would look like in WANT.

Many thanks in advance,

Sandy

DATA WORK.HAVE;
 INPUT UID $ STATUS $ DATE: DATE9.;
 FORMAT DATE DATE9.;
 INFILE DATALINES DLM=',';
 DATALINES;
1,A,01NOV2018
1,A,01NOV2018
1,A,03NOV2018
1,B,04NOV2018
1,C,05NOV2018
1,D,06NOV2018
1,E,07NOV2018
2,A,01NOV2018
2,B,03NOV2018
2,C,04NOV2018
2,D,05NOV2018
2,A,05NOV2018
2,D,06NOV2018
3,A,01NOV2018
3,B,02NOV2018
3,C,03NOV2018
;
RUN;

DATA WORK.WANT;
 INPUT UID $ STATUS $ DATE: DATE9.;
 FORMAT DATE DATE9.;
 INFILE DATALINES DLM=',';
 DATALINES;
1,A,01NOV2018
1,A,01NOV2018
1,A,03NOV2018
1,B,04NOV2018
1,C,05NOV2018
1,D,06NOV2018
2,A,01NOV2018
2,B,03NOV2018
2,C,04NOV2018
2,D,05NOV2018
;
RUN;

novinosrin · Posted 11-28-2018 03:04 PM

DATA WORK.HAVE;
 INPUT UID $ STATUS $ DATE: DATE9.;
 FORMAT DATE DATE9.;
 INFILE DATALINES DLM=',';
 DATALINES;
1,A,01NOV2018
1,A,01NOV2018
1,A,03NOV2018
1,B,04NOV2018
1,C,05NOV2018
1,D,06NOV2018
1,E,07NOV2018
2,A,01NOV2018
2,B,03NOV2018
2,C,04NOV2018
2,D,05NOV2018
2,A,05NOV2018
2,D,06NOV2018
3,A,01NOV2018
3,B,02NOV2018
3,C,03NOV2018
;
RUN;

data want;
length _s $1;
do until(last.uid);
set have;
by uid STATUS notsorted;
if status='A' and not _t1 then _t1=date;
else if  not _t2 and status='D' then _t2=date;
end;
do until(last.uid);
set have;
by uid STATUS notsorted;
if n(of _t:)=2 and _t1<=date<=_t2 and missing(_s) then output;
if status='D' then _s=status;
end;
drop _:;
run;

View solution in original post

novinosrin · Posted 11-28-2018 02:33 PM

Hi @Sanflo First off, welcome to SAS communities. Secondly, Thank you & hats off to you for a clean post with neat samples. You deserve credit for that. My oh my, didn't expect that from a new member.

Ok, a small clarification plz-

1 Just like you have multiples of status=A in UID 1, can there be multiples of status=D for a by group?

2. If the above is yes, So the records to pick should be earliest A to latest D or earliest D?Quoting your line-the first time the same UID enters STATUS "D". is indeed clear but I still want to make sure.

Also, in the output sample

2,D,05NOV2018
2,A,05NOV2018 will this not qualify as it falls on the same date as that of D?

Thanks & welcome again

novinosrin · Posted 11-28-2018 03:04 PM

DATA WORK.HAVE;
 INPUT UID $ STATUS $ DATE: DATE9.;
 FORMAT DATE DATE9.;
 INFILE DATALINES DLM=',';
 DATALINES;
1,A,01NOV2018
1,A,01NOV2018
1,A,03NOV2018
1,B,04NOV2018
1,C,05NOV2018
1,D,06NOV2018
1,E,07NOV2018
2,A,01NOV2018
2,B,03NOV2018
2,C,04NOV2018
2,D,05NOV2018
2,A,05NOV2018
2,D,06NOV2018
3,A,01NOV2018
3,B,02NOV2018
3,C,03NOV2018
;
RUN;

data want;
length _s $1;
do until(last.uid);
set have;
by uid STATUS notsorted;
if status='A' and not _t1 then _t1=date;
else if  not _t2 and status='D' then _t2=date;
end;
do until(last.uid);
set have;
by uid STATUS notsorted;
if n(of _t:)=2 and _t1<=date<=_t2 and missing(_s) then output;
if status='D' then _s=status;
end;
drop _:;
run;

Sanflo · Posted 11-28-2018 06:09 PM

Hi @novinosrin,

First, thanks for the kind words and answers to your questions from prior:

1. Yes there can be multiples of D for a by group. We want to stop when it finds the first instance of this (at least just now).

2. It would be the earliest A to the earliest D.

Next thank you so much for your solution provided. It worked perfectly in with both the sample data and also when I modified it to my live data. If you don't mind may I ask you another question?

- I realised once I ran it that my original requirement that keeping all records up to and including "D" isn't quite correct. I should find all records up to "D" but then not including "D" of the by group. I updated the below line of your code to change the 2nd "<=" to a "<". After testing this everything looks perfect in the data. Would you be able to confirm that this would be correct and perhaps explain how the code works a little?

Again much appreciated for the code and your time!

Sandy

IF N(OF _T:)=2 AND _T1<=DATEUPDATED<_T2 and MISSING(_S) THEN OUTPUT;

novinosrin · Posted 11-28-2018 06:26 PM

Hi @Sanflo Well done, The change you did was neat and correct.

Basically, The code is pretty straight forward as your sample is sorted by ascending date, uses two passes(executing a by group twice hence you see do until*twice).

On the first pass, we determine the first date for A and the first date for D. We hold these two dates in temporary variables that we would use for conditional check and output on the 2nd pass of the same by group and eventually drop them at the end.

The N(of _t: ) determines the number of non missing values as if one were to be missing, we are either missing A or D. Since we need both, it has be 2, so that's a check in place.

Then it's pretty much the simple between-and filter that does our need.

Now for your requirement's minor change(up-to to D but excluding D), the code further simplifies to:

data want;
do until(last.uid);
set have;
by uid STATUS notsorted;
if status='A' and not _t1 then _t1=date;
else if  not _t2 and status='D' then _t2=date;
end;
do until(last.uid);
set have;
by uid STATUS notsorted;
if n(of _t:)=2 and _t1<=date<_t2  then output;
end;
drop _:;
run;

for the reason that we are not concerned about the D date anymore which may have another status like A falling on the same date. Nonetheless, please do test thoroughly and feel free to come back to us if you need more help.

HTH & Regards!

@Sanflo wrote:

Hi @novinosrin,

First, thanks for the kind words and answers to your questions from prior:

1. Yes there can be multiples of D for a by group. We want to stop when it finds the first instance of this (at least just now).

2. It would be the earliest A to the earliest D.

Next thank you so much for your solution provided. It worked perfectly in with both the sample data and also when I modified it to my live data. If you don't mind may I ask you another question?

- I realised once I ran it that my original requirement that keeping all records up to and including "D" isn't quite correct. I should find all records up to "D" but then not including "D" of the by group. I updated the below line of your code to change the 2nd "<=" to a "<". After testing this everything looks perfect in the data. Would you be able to confirm that this would be correct and perhaps explain how the code works a little?

Again much appreciated for the code and your time!

Sandy
IF N(OF _T:)=2 AND _T1<=DATEUPDATED<_T2 and MISSING(_S) THEN OUTPUT;

Selecting all records between two dates with conditions

Re: Selecting all records between two dates with conditions

Re: Selecting all records between two dates with conditions

Re: Selecting all records between two dates with conditions

Re: Selecting all records between two dates with conditions

Re: Selecting all records between two dates with conditions

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!