Hunting Down Problem Cases in a Large Data set

JakeAZ · Posted 07-29-2012 03:29 PM

Hi all--

I am working with a very large data set and I need to identify some problem cases. I have to find the cases which are being dropped where the MAX

of CLOSINGS_ID for the CASE_ID group is BLANK/NULL and an earlier CLOSE date (a calculated variable) has been picked up.

The cases I’m trying to hunt down look like this:

CASE_ID	CLOSING_DT_1	CLOSING_DT_2	CLOSINGS_ID	CLOSE
111	.	7-Dec-11	1	7-Dec-11
111	7-Oct-11	.	2	7-Oct-11
111	.		3	.
111	.	.	4	.

I am using these two programs below. One with the HAVING clause and then one without the HAVING clause and then trying all sorts of different ways to combine them together to find with cases which are dropped when the have HAVING clause is used. I can’t figure it out. I just want a list of all these problem cases. Is there a way to do this is Base SAS as well as using Proc SQL as well?

Any help is greatly appreciated!!

Proc sql;

create table With_HAVING as

select distinct

CASE_ ID,

CLOSING_DT_1,

CLOSING_DT_2

CLOSINGS_ID,

case when CLOSING_DT_2 gt .then CLOSING_DT_2

when CLOSING_DT_1 gt . then CLOSING_DT_1 end as CLOSE

from DATA_HAVE

group by CASE_ ID, CLOSINGS_ID

having CLOSINGS_ID =max(CLOSINGS_ID)

;quit;

Proc sql;

create table With_NO_HAVING as

select distinct

CASE_ ID,

CLOSING_DT_1,

CLOSING_DT_2

CLOSINGS_ID,

case when CLOSING_DT_2 gt .then CLOSING_DT_2

when CLOSING_DT_1 gt . then CLOSING_DT_1 end as CLOSE

from DATA_HAVE

;quit;

Ksharp · Posted 07-30-2012 12:45 AM

Maybe you could explain it more detail.

What is BLANK/NULL when you refer to MAX .

And what output would you like to see and why you have to use DATA STEP ?

Ksharp

JakeAZ · Posted 07-30-2012 08:32 AM

Thanks KSharp. By Blank/Null I am on referring to all the cases where CLOSE is . When I take the highest (MAX) CLOSINGS_ID.

CASE_ID	CLOSING_DT_1	CLOSING_DT_2	CLOSINGS_ID	CLOSE
111	.	#####	1	7-Dec-11
111	7-Oct-11	.	2	7-Oct-11
111	.		3	.
111	.	.	4	.

So, when I take the MAX CLOSINGS_ID and group by CASE_ID I'm actually dropping a close date when I should pick it up.

CASE_ID	CLOSING_DT_1	CLOSING_DT_2	CLOSINGS_ID	CLOSE
111	.	.	4	.

The output I need is just a data set.

is the above helpful?

Thanks.

Astounding · Posted 07-30-2012 10:39 AM

Jake,

This is actually something that a DATA step does pretty easily. Subject to my understanding the problem correctly, here is one attempt that you can tweak as you see fit. The drawback: it relies on the incoming data being sorted by CASE_ID.

data problems;

set have;

by case_id;

if first.case_id then do;

earlier_dt1 = .;

earlier_dt2 = .;

end;

retain earlier_dt1 earlier_dt2;

if closing_dt1 > . then earlier_dt1 = closing_dt1;

if closing_dt2 > . then earlier_dt2 = closing_dt2;

if last.case_id and (closing_dt1 = closing_dt2 = .) and (earlier_dt1 > . or earlier_dt2 > .);

run;

The logic of which cases are selected may be exactly what you want, or may be very close. But you can test it and see how close it comes.

Good luck.

Ksharp · Posted 07-30-2012 10:57 PM

It is easy for SQL. If I understand your question.

data have;
infile cards expandtabs truncover;
input CASE_ID     CLOSING_DT_1 : date9.     CLOSING_DT_2 : date9.     CLOSINGS_ID     ;
format  CLOSING_DT_1 CLOSING_DT_2  date11.;
cards;
111     .     7-Dec-11     1     
111     7-Oct-11     .     2
111     .     .     3     
111     .     .     4
;
run;
proc sql;
create table close as
 select *,coalesce(CLOSING_DT_1,CLOSING_DT_2) as CLOSE format date11.
  from have;

create table dropped as
 select * 
  from close 
   group by CASE_ID
    having CLOSINGS_ID=max(CLOSINGS_ID) and CLOSE is missing;

create table pick_up as
 select * from close
  except
 select * from dropped ;

quit;

Ksharp

Hunting Down Problem Cases in a Large Data set

Re: Hunting Down Problem Cases in a Large Data set

Re: Hunting Down Problem Cases in a Large Data set

Re: Hunting Down Problem Cases in a Large Data set

Re: Hunting Down Problem Cases in a Large Data set

Hunting Down Problem Cases in a Large Data set

Re: Hunting Down Problem Cases in a Large Data set

Re: Hunting Down Problem Cases in a Large Data set

Re: Hunting Down Problem Cases in a Large Data set

Re: Hunting Down Problem Cases in a Large Data set

Registration is open

SAS Training: Just a Click Away