Greetings Community,
I have the following data, which takes four combinations:
1. Not having a closed order (Open) ==> keep the latest observation
2. Having an open and then closed order in last month ==> keep the open order's observation
3. Having a closed order ==> keep the latest observation
Here is what I have:
Condition # | ID | Report Month | Status Date | Status |
1 | 9001 | 02/28/2018 | 02/10/2018 | Open |
1 | 9001 | 03/31/2018 | 03/06/2018 | Open |
1 | 9001 | 04/30/2018 | 04/16/2018 | Open |
2 | 1234 | 01/31/2018 | 01/15/2018 | Open |
2 | 1234 | 02/28/2018 | 02/10/2018 | Open |
2 | 1234 | 03/31/2018 | 03/07/2018 | Open |
2 | 1234 | 04/30/2018 | 04/12/2018 | Closed |
2 | 1234CN | 04/30/2018 | 04/17/2018 | Closed |
3 | 5678 | 01/31/2018 | 01/15/2018 | Open |
3 | 5678 | 02/28/2018 | 02/10/2018 | Open |
3 | 5678CN | 04/30/2018 | 03/19/2018 | Closed |
3 | 5678 | 04/30/2018 | 04/12/2018 | Open |
4 | 4444CN | 02/28/2018 | 02/12/2018 | Closed |
4 | 4444CN | 03/31/2018 | 03/16/2018 | Closed |
4 | 4444CN | 04/30/2018 | 04/09/2018 | Closed |
Here is what I want:
Condition # | ID | Report Month | Status Date | Status |
1 | 9001 | 04/30/2018 | 04/16/2018 | Open |
2 | 1234CN | 04/30/2018 | 04/12/2018 | Closed |
3 | 5678 | 04/30/2018 | 04/12/2018 | Open |
4 | 4444CN | 04/30/2018 | 04/09/2018 | Closed |
Thanks!
data have;
infile cards expandtabs ;
input (Condition ID ReportMonth StatusDate Status) (: $20.);
cards;
1 9001 02/28/2018 02/10/2018 Open
1 9001 03/31/2018 03/06/2018 Open
1 9001 04/30/2018 04/16/2018 Open
2 1234 01/31/2018 01/15/2018 Open
2 1234 02/28/2018 02/10/2018 Open
2 1234 03/31/2018 03/07/2018 Open
2 1234 04/30/2018 04/12/2018 Closed
2 1234CN 04/30/2018 04/17/2018 Closed
3 5678 01/31/2018 01/15/2018 Open
3 5678 02/28/2018 02/10/2018 Open
3 5678CN 04/30/2018 03/19/2018 Closed
3 5678 04/30/2018 04/12/2018 Open
4 4444CN 02/28/2018 02/12/2018 Closed
4 4444CN 03/31/2018 03/16/2018 Closed
4 4444CN 04/30/2018 04/09/2018 Closed
;
run;
data want;
n=0;
do i=1 by 1 until(last.Condition);
set have;
by Condition;
if Status='Open' then idx_open=i;
if Status='Closed' then do;
idx_Closed=i;
n+1;
if n=1 then do;r=ReportMonth;s=StatusDate;end;
end;
end;
if . < idx_open < idx_Closed then do;
ReportMonth=r;StatusDate=s;
end;
drop n i r s idx_:;
run;
It might help to indicate which condition, 1, 2 or 3 each of those kept records represents.
And the fact that you might have multiple values of ID that are treated the same needs some definition such as 1234 and 1234CN, 5678 and 5678CN. Since date sequence is important in processing this data some approaches would likely be looking at sorting the data but your shown behavior for 5678CN means that is not going to work without some additional steps.
Thanks ballardw. I just updated the data with the condition number.
data have;
infile cards expandtabs ;
input (Condition ID ReportMonth StatusDate Status) (: $20.);
cards;
1 9001 02/28/2018 02/10/2018 Open
1 9001 03/31/2018 03/06/2018 Open
1 9001 04/30/2018 04/16/2018 Open
2 1234 01/31/2018 01/15/2018 Open
2 1234 02/28/2018 02/10/2018 Open
2 1234 03/31/2018 03/07/2018 Open
2 1234 04/30/2018 04/12/2018 Closed
2 1234CN 04/30/2018 04/17/2018 Closed
3 5678 01/31/2018 01/15/2018 Open
3 5678 02/28/2018 02/10/2018 Open
3 5678CN 04/30/2018 03/19/2018 Closed
3 5678 04/30/2018 04/12/2018 Open
4 4444CN 02/28/2018 02/12/2018 Closed
4 4444CN 03/31/2018 03/16/2018 Closed
4 4444CN 04/30/2018 04/09/2018 Closed
;
run;
data want;
n=0;
do i=1 by 1 until(last.Condition);
set have;
by Condition;
if Status='Open' then idx_open=i;
if Status='Closed' then do;
idx_Closed=i;
n+1;
if n=1 then do;r=ReportMonth;s=StatusDate;end;
end;
end;
if . < idx_open < idx_Closed then do;
ReportMonth=r;StatusDate=s;
end;
drop n i r s idx_:;
run;
Thanks Ksharp. While this code works for this sample data, I do not actually have the variable condition in my data. I created it here because it was requested by one of the members. Any idea on how to create this "condition" variable, so that your code will work in my data?
Thanks,
You still need to describe why and how the ID variable with different values is to be treated as the same.
I had asked about adding the condition you were treating a number of records because I did not see how the simple values of the ID variable could yield the desired results because they appeared to involve at least two different actual values of the "ID" variable. So I asked for more of a description on how an "ID" has two (or is it more) values that are treated as essentially identical. For instance is it possible that somewhere lurking the data is something like "123CN" that is supposed to also be the "same" id as 1234? Or 1234PQ? or others.
I think that you might need to add an additional variable to actually uniquely identify groups of ID values into a single "group id" for processing.
COMPRESS() can create a new condition variable.
Condition=compress(id,,'kd');
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.