Hello,
I'd appreciate some help regarding user-written base code. Consider the following example input table, which is filtered on ORDER_NO=4687:
ORDER_NO | ORDER_SEQ_NO | ORDER_STATUS | COMMENT | DATE |
---|---|---|---|---|
4687 | 1 | OPP | 15SEP2011 | |
4687 | 2 | FIN | 27SEP2011 | |
4687 | 3 | AB | 05OCT2011 | |
4687 | 4 | FIN | 06OCT2011 | |
4687 | 5 | FIN | Hello world! | 27JAN2014 |
Each time a change is made to an order, a new row is triggered (the one with the highest ORDER_SEQ_NO being the most recent). This ideally only happens when the order status changes. However, if a consultant manually edits the order (here by adding a comment), another row is triggered despite the status not having changed.
So far, my job has always selected the date from the most recent row (27JAN2014), i.e. from the row with the highest ORDER_SEQ_NO. However, I need it to select the first date for the most recently triggered order status. So, in the example above, the logic should select the date 06OCT2011. Not 27SEP2011, as there's been a different status since back then.
I can't simply group by the most recent ORDER_STATUS and select min(DATE), as that would select the date from row 2. I need row 4 from this example, and of course I can't just select ORDER_SEQ_NO=4 directly since the logic must apply to numerous other orders as well.
Consider this code asuming the dataset is sorted by ORDER_NO and ORDER_SEQ_NO:
data have;
input ORDER_NO 4. ORDER_SEQ_NO 2. ORDER_STATUS $3. COMMENT $13. DATE date9.;
format date date9.;
cards;
4687 1OPP 15-Sep-11
4687 2FIN 27-Sep-11
4687 3AB 5-Oct-11
4687 4FIN 6-Oct-11
4687 5FINHello world! 27-Jan-14
;
run;
data want;
set have;
retain STATUS ORDER_DATE;
format order_date date9.;
by order_no;
if first.order_no
then do;
status=order_status;
order_date=date;
end;
else if order_status ne status
then do;
status=order_status;
order_date=date;
end;
if last.order_no then output;
keep order_no status order_date;
run;
So what you want is the most recent.
But if there exist earlier record with the
then this record should be selected.
Is that what you want?
It should select the date from the most recent row, unless an earlier row has the same status sequentially, in which case it should select the date from that row.
You might think of it like this: 'The most recent status change was from AB to FIN. That happened on 06OCT2011, so I select that date."
Consider this code asuming the dataset is sorted by ORDER_NO and ORDER_SEQ_NO:
data have;
input ORDER_NO 4. ORDER_SEQ_NO 2. ORDER_STATUS $3. COMMENT $13. DATE date9.;
format date date9.;
cards;
4687 1OPP 15-Sep-11
4687 2FIN 27-Sep-11
4687 3AB 5-Oct-11
4687 4FIN 6-Oct-11
4687 5FINHello world! 27-Jan-14
;
run;
data want;
set have;
retain STATUS ORDER_DATE;
format order_date date9.;
by order_no;
if first.order_no
then do;
status=order_status;
order_date=date;
end;
else if order_status ne status
then do;
status=order_status;
order_date=date;
end;
if last.order_no then output;
keep order_no status order_date;
run;
CTorres,
Your code is not right . Check the following data :
data have;
input ORDER_NO 4. ORDER_SEQ_NO 2. ORDER_STATUS $3. COMMENT $13. DATE date9.;
format date date9.;
cards;
4687 1OPP 15-Sep-11
4687 2FIN 27-Sep-11
4687 3AB 5-Oct-11
4687 4AB 6-Oct-11
4687 5FINHello world! 27-Jan-14
;
run;
data have; input ORDER_NO ORDER_SEQ_NO ORDER_STATUS $ COMMENT & $18. DATE : date9.; format date date9.; cards; 4687 1 OPP . 15-Sep-11 4687 2 FIN . 27-Sep-11 4687 3 AB . 5-Oct-11 4687 4 FIN . 6-Oct-11 4687 5 FIN Hello world! 27-Jan-14 ; run; proc sql; create table want as select a.* from have as a,(select * from have group by ORDER_NO having ORDER_SEQ_NO=max(ORDER_SEQ_NO)) as b where a.ORDER_NO=b.ORDER_NO and a.ORDER_STATUS=b.ORDER_STATUS and a.ORDER_SEQ_NO ne b.ORDER_SEQ_NO group by a.ORDER_NO having a.ORDER_SEQ_NO=max(a.ORDER_SEQ_NO); quit;
Xia Keshan
Message was edited by: xia keshan Sorry. Missing a GROUP clause.
Xia,
I think the code works fine, even with the change you suggest.
If I understand correctly, the purpose of the program is to get the most recent update record that includes a change of status, independently of any other change.
CTorres
CTorres,
Yeah. If I understand OP correctly . He also want the SAME ORDER_STATUS . But apparently you didn't consider it .
Xia Keshan
data want;
set have;
by ORDER_NO ORDER_SEQ_NO;
if ORDER_STATUS ne lag(ORDER_STATUS) then flag+1;
run;
proc sql;
select order_no,date from want
group by order_no, flag
having ORDER_SEQ_NO=min(ORDER_SEQ_NO)
and count(ORDER_STATUS)>1;
quit;
I think your code does not work for the general case: more than one Order with different number of order changes. For example with the following "have" dataset:
data have;
input ORDER_NO 4. ORDER_SEQ_NO 2. ORDER_STATUS $3. COMMENT $13. DATE date9.;
format date date9.;
cards;
4687 1OPP 15-Sep-11
4687 2FIN 27-Sep-11
4687 3AB 5-Oct-11
4687 4FIN 6-Oct-11
4687 5FINHello world! 27-Jan-14
4999 1FINComplete 20-Aug-14
5680 1OPP 18-Dec-13
5680 2AB 19-Dec-13
5700 1OPP 10-Dec-13
5700 2OPPXXX 20-Jan-14
;
run;
Hi,
I think it should work. Please try to validate.
Regards,
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.