Hi all--
I am working with a very large data set and I need to identify some problem cases. I have to find the cases which are being dropped where the MAX
of CLOSINGS_ID for the CASE_ID group is BLANK/NULL and an earlier CLOSE date (a calculated variable) has been picked up.
The cases I’m trying to hunt down look like this:
| CASE_ID | CLOSING_DT_1 | CLOSING_DT_2 | CLOSINGS_ID | CLOSE | 
| 111 | . | 7-Dec-11 | 1 | 7-Dec-11 | 
| 111 | 7-Oct-11 | . | 2 | 7-Oct-11 | 
| 111 | . | 3 | . | |
| 111 | . | . | 4 | . | 
I am using these two programs below. One with the HAVING clause and then one without the HAVING clause and then trying all sorts of different ways to combine them together to find with cases which are dropped when the have HAVING clause is used. I can’t figure it out. I just want a list of all these problem cases. Is there a way to do this is Base SAS as well as using Proc SQL as well?
Any help is greatly appreciated!!
Proc sql;
create table With_HAVING as
select distinct
CASE_ ID,
CLOSING_DT_1,
CLOSING_DT_2
CLOSINGS_ID,
case when CLOSING_DT_2 gt .then CLOSING_DT_2
when CLOSING_DT_1 gt . then CLOSING_DT_1 end as CLOSE
from DATA_HAVE
group by CASE_ ID, CLOSINGS_ID
having CLOSINGS_ID =max(CLOSINGS_ID)
;quit;
Proc sql;
create table With_NO_HAVING as
select distinct
CASE_ ID,
CLOSING_DT_1,
CLOSING_DT_2
CLOSINGS_ID,
case when CLOSING_DT_2 gt .then CLOSING_DT_2
when CLOSING_DT_1 gt . then CLOSING_DT_1 end as CLOSE
from DATA_HAVE
;quit;
Maybe you could explain it more detail.
What is BLANK/NULL when you refer to MAX .
And what output would you like to see and why you have to use DATA STEP ?
Ksharp
Thanks KSharp. By Blank/Null I am on referring to all the cases where CLOSE is . When I take the highest (MAX) CLOSINGS_ID.
| CASE_ID | CLOSING_DT_1 | CLOSING_DT_2 | CLOSINGS_ID | CLOSE | 
| 111 | . | ##### | 1 | 7-Dec-11 | 
| 111 | 7-Oct-11 | . | 2 | 7-Oct-11 | 
| 111 | . | 3 | . | |
| 111 | . | . | 4 | . | 
So, when I take the MAX CLOSINGS_ID and group by CASE_ID I'm actually dropping a close date when I should pick it up.
| CASE_ID | CLOSING_DT_1 | CLOSING_DT_2 | CLOSINGS_ID | CLOSE | 
| 111 | . | . | 4 | . | 
The output I need is just a data set.
is the above helpful?
Thanks.
Jake,
This is actually something that a DATA step does pretty easily. Subject to my understanding the problem correctly, here is one attempt that you can tweak as you see fit. The drawback: it relies on the incoming data being sorted by CASE_ID.
data problems;
set have;
by case_id;
if first.case_id then do;
earlier_dt1 = .;
earlier_dt2 = .;
end;
retain earlier_dt1 earlier_dt2;
if closing_dt1 > . then earlier_dt1 = closing_dt1;
if closing_dt2 > . then earlier_dt2 = closing_dt2;
if last.case_id and (closing_dt1 = closing_dt2 = .) and (earlier_dt1 > . or earlier_dt2 > .);
run;
The logic of which cases are selected may be exactly what you want, or may be very close. But you can test it and see how close it comes.
Good luck.
It is easy for SQL. If I understand your question.
data have; infile cards expandtabs truncover; input CASE_ID CLOSING_DT_1 : date9. CLOSING_DT_2 : date9. CLOSINGS_ID ; format CLOSING_DT_1 CLOSING_DT_2 date11.; cards; 111 . 7-Dec-11 1 111 7-Oct-11 . 2 111 . . 3 111 . . 4 ; run; proc sql; create table close as select *,coalesce(CLOSING_DT_1,CLOSING_DT_2) as CLOSE format date11. from have; create table dropped as select * from close group by CASE_ID having CLOSINGS_ID=max(CLOSINGS_ID) and CLOSE is missing; create table pick_up as select * from close except select * from dropped ; quit;
Ksharp
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
