Desktop productivity for business analysts and programmers

Complicated Data Cleaning Code

Posts: 0

Complicated Data Cleaning Code

Another complicated question.

Month-end files contain borrower's status: Defaulted, Non-default, Just defaulted in the current month. Supposedly, borrowers cannot cure their loan within 6 months. That means, if he defualted at 01/2008 and was maked as:

01/2008 - Defaulted at current month
02/2008 - Defaulted
03/2008 - Defaulted
04/2008 - Non-default
05/2008 - Defaulted
06/2008 - Defaulted

The record in 04/2008 must be wrong. How can i detect it? so complicated...hope someone can help! Thanks!
Esteemed Advisor
Posts: 5,198

Re: Complicated Data Cleaning Code

You can use RETAIN to keep track on the date you want to start you 6 month period. It would help if you transform your MM/YYYYY to SAS date format, then you can use INTCK function or calculate the number of days between the current status date and the original defaulted date.

Data never sleeps
Respected Advisor
Posts: 3,832

Re: Complicated Data Cleaning Code

The LAG() function might also be of some help for you.
Posts: 0

Re: Complicated Data Cleaning Code

can anyone please share more SAS codes for this question because I am still confused about the code. Thanks a lot!
Super Contributor
Super Contributor
Posts: 3,174

Re: Complicated Data Cleaning Code

Suggest you share info about your actual data, ideally in a DATA step format, using DATALINES for an instream data example, and show an INPUT statement to read this data -- this info will help us understand what your working with.

Fundamentally, the DATA STEP facilities to use with a BY statement, such as:

*sort your data using PROC SORT. ;
BY ;

* now analyze patterns of mydate values for ;
* an individual, using the DATA step. ;
DATA ...;
BY varname1 varname2;
IF FIRST.varname2 THEN DO;
* Use RETAIN to keep track of first date across ;
* DATA step executions. ;
RETAIN F_mydate 0;
F_mydate = mydate;
* code to compare mydate to F_mydate;

Scott Barry
SBBWorks, Inc.
Posts: 0

Re: Complicated Data Cleaning Code

I have tried this code, but the list is showing all the borrowers who have changed their Default_Status instead of only the borrower changed their Default_Status from Defaulted to Non_Default within 6 months. Any modification needed?

data Temp9 ;
changes = -1 ;
do until( last.borrid ) ;
set Data1 data2 data3...data40;
by borrID ;
last_Default_Status= lag(Default_Status);
if first.borrID then last_Default_Status= ' ' ;
if Default_Status ne last_Default_Status then changes + 1 ;
if changes >1 ;
Posts: 0

Re: Complicated Data Cleaning Code

My data format is like this:

Timekey mmddyy10. ;
Borrid $80. ;
Source_Id $11. ;
BorrName $49. ;
DefaultStatus $15. ;

Month-end file: 12/2007

Timekey Borrid Source_id BorrName DefaultStatus
12/31/2007 00123456789 4-123456 ABC Company Defaulted
12/31/2007 00987654321 2-157892 XYZ Company Non-default
New Contributor JMo
New Contributor
Posts: 2

Re: Complicated Data Cleaning Code

You might try using a hash table. There was a great paper from SGF2008 called How Do I Love Hash Tables? Let Me Count the Ways!. See the section titled "Deja Vu". You could change the code to check if DefaultStatus has changed and perform some action based on that information.
Ask a Question
Discussion stats
  • 7 replies
  • 5 in conversation