BookmarkSubscribeRSS Feed
dee_arr
Fluorite | Level 6

Good evening,

 

I am trying to create an identifier that will uniquely identify loans in my data over time. The data is quarterly, and a person can be paying multiple loans taken out at different points in time (multiple OpenQuarter values in sample data, below). Currently, a loan is identified uniquely as a combination of person ID, loan ID, and lender ID.

 

The problem is that loans are often transferred between lenders; loans can change either lender ID in the process (e.g. observations #5 and #23 in the sample data) or both loan ID and lender ID (e.g. observations #17 and #32 in the sample data). I can figure out when a loan is transferred because it will show up as a "new" loan-lender combination for the person, but the number of loans for that person that were originated in a given quarter will stay the same relative to the previous quarter, the sum of "HiCredit" (maximum loan amount) for loans originated in the same quarter will stay the same, and the total balance for loans originated in the same quarter is very similar ("fuzzy" match).

 

What I am after is the last column, "UniqueID" - a way to track the loan over time. I don't care if this unique ID is the first loan-lender combination, or the last one (after transfer), or a completely new number, as long as I can then properly calculate leads/lags of balance and other fields (many others not shown in data) for a particular loan.

 

Any ideas for how I could accomplish this would be very much appreciated!

 

Sample data below:

 

data WORK.IMPORT;
infile datalines dsd truncover;
input Obs:BEST12. Client:$18. Loan:$17. Lender:$6. Quarter:MMDDYY10. OpenQuarter:MMDDYY10. Balance:BEST12. HiCredit:BEST12. LoanTransfer:BEST12. UniqueID:$23.;
format Obs BEST12. Quarter MMDDYY10. OpenQuarter MMDDYY10. Balance BEST12. HiCredit BEST12. LoanTransfer BEST12.;
datalines;
1 00C3A26416BBD01F81 029VF12376VU Blue 09/01/2010 09/01/2010 1000 1000 . 029VF12376VU-Blue
2 00C3A26416BBD01F81 029VF12376VU Blue 12/01/2010 09/01/2010 2000 2000 . 029VF12376VU-Blue
3 00C3A26416BBD01F81 029VF12376VU Blue 03/01/2011 09/01/2010 1900 2000 . 029VF12376VU-Blue
4 00C3A26416BBD01F81 029VF12376VU Blue 09/01/2011 09/01/2010 1800 2000 . 029VF12376VU-Blue
5 00C3A26416BBD01F81 029VF12376VU Purple 12/01/2011 09/01/2010 1700 2000 1 029VF12376VU-Blue
6 00C3A26416BBD01F81 029VF12376VU Purple 03/01/2012 09/01/2010 1600 2000 . 029VF12376VU-Blue
7 00C3A26416BBD01F81 029VF12376VU Purple 06/01/2012 09/01/2010 1500 2000 . 029VF12376VU-Blue
8 00C3A26416BBD01F81 029VF12376VU Purple 09/01/2012 09/01/2010 1400 2000 . 029VF12376VU-Blue
9 00C3A26416BBD01F81 029VF12376VU Purple 12/01/2012 09/01/2010 1300 2000 . 029VF12376VU-Blue
10 00C3A26416BBD01F81 HJ-79991272075810 Green 03/01/2010 09/01/2006 3100 3000 . HJ-79991272075810-Green
11 00C3A26416BBD01F81 HJ-79991272075810 Green 06/01/2010 09/01/2006 3100 3000 . HJ-79991272075810-Green
12 00C3A26416BBD01F81 HJ-79991272075810 Green 09/01/2010 09/01/2006 3100 3000 . HJ-79991272075810-Green
13 00C3A26416BBD01F81 HJ-79991272075810 Green 12/01/2010 09/01/2006 3100 3000 . HJ-79991272075810-Green
14 00C3A26416BBD01F81 HJ-79991272075810 Green 03/01/2011 09/01/2006 3100 3000 . HJ-79991272075810-Green
15 00C3A26416BBD01F81 HJ-79991272075810 Green 06/01/2011 09/01/2006 3100 3000 . HJ-79991272075810-Green
16 00C3A26416BBD01F81 HJ-79991272075810 Green 09/01/2011 09/01/2006 3100 3000 . HJ-79991272075810-Green
17 00C3A26416BBD01F81 11672365903335X Orange 12/01/2011 09/01/2006 3100 3000 1 HJ-79991272075810-Green
18 00C3A26416BBD01F81 11672365903335X Orange 03/01/2012 09/01/2006 3100 3000 . HJ-79991272075810-Green
19 00C3A26416BBD01F81 11672365903335X Orange 06/01/2012 09/01/2006 3100 3000 . HJ-79991272075810-Green
20 01A48C760A07DEA200 14584QP34577 Purple 03/01/2010 03/01/2006 5000 5000 . 14584QP34577-Purple
21 01A48C760A07DEA200 14584QP34577 Purple 06/01/2010 03/01/2006 4900 5000 . 14584QP34577-Purple
22 01A48C760A07DEA200 14584QP34577 Purple 09/01/2010 03/01/2006 4800 5000 . 14584QP34577-Purple
23 01A48C760A07DEA200 14584QP34577 Green 12/01/2010 03/01/2006 4700 5000 1 14584QP34577-Purple
24 01A48C760A07DEA200 14584QP34577 Green 03/01/2012 03/01/2006 4600 5000 . 14584QP34577-Purple
25 01A48C760A07DEA200 14584QP34577 Green 06/01/2012 03/01/2006 4500 5000 . 14584QP34577-Purple
26 01A48C760A07DEA200 14584QP34577 Green 09/01/2012 03/01/2006 4400 5000 . 14584QP34577-Purple
27 01A48C760A07DEA200 14584QP34577 Green 12/01/2012 03/01/2006 4300 5000 . 14584QP34577-Purple
28 01A48C760A07DEA200 291FZ Green 12/01/2010 03/01/2006 4100 4000 . 291FZ-Green
29 01A48C760A07DEA200 291FZ Green 03/01/2011 03/01/2006 4200 4000 . 291FZ-Green
30 01A48C760A07DEA200 291FZ Green 06/01/2011 03/01/2006 4300 4000 . 291FZ-Green
31 01A48C760A07DEA200 291FZ Green 09/01/2011 03/01/2006 4400 4000 . 291FZ-Green
32 01A48C760A07DEA200 13490999291FZ Orange 12/01/2011 03/01/2006 4500 4000 1 291FZ-Green
33 01A48C760A07DEA200 13490999291FZ Orange 03/01/2012 03/01/2006 4600 4000 . 291FZ-Green
34 01A48C760A07DEA200 13490999291FZ Orange 06/01/2012 03/01/2006 4700 4000 . 291FZ-Green
35 01A48C760A07DEA200 13490999291FZ Orange 09/01/2012 03/01/2006 4800 4000 . 291FZ-Green
36 01A48C760A07DEA200 13490999291FZ Orange 12/01/2012 03/01/2006 4900 4000 . 291FZ-Green
;;;;
4 REPLIES 4
Norman21
Lapis Lazuli | Level 10

Would it be helpful to construct a unique ID from a combination of person ID, loan ID, lender ID and date of loan initiation?

Norman.
SAS 9.4 (TS1M6) X64_10PRO WIN 10.0.17763 Workstation

andreas_lds
Jade | Level 19

Let's see if i understood, what you have and need.

 

LoanTransfer is in your dataset, so you know if that loan has been transferred.

 

You can't use Loan and Lender in your identifier, because when they change it is impossible to create the initially generated identifier for that loan - in obs = 5 you don't have Lender = Blue. Relying on Client + OpenQuarter seems to be impossible, too, because a Client could have more than one loan opened in the same quarter, right?

 

dee_arr
Fluorite | Level 6
My apologies - LoanTransfer is a field I developed by looking at changes in the following fields:

Sum(count), by OpenQuarter, Quarter, Person ID
Sum(HiCredit), by OpenQuarter, Quarter, Person ID
Sum(Balance), by OpenQuarter, Quarter, Person ID
Running count of unique Loan-Lender combinations within Person

Yes, a person can have more than one loan originated in a quarter, unfortunately. Theoretically, they can have more than one loan originated in a quarter with the same hicredit, but it is extremely rare.

I guess where I get blocked is pulling the Loan/Lender ID of the loan before the transfer once I detect the transfer, so that I can pull it down?
andreas_lds
Jade | Level 19

If the loans are sorted by Quarter than the following step seems to solve the problem:

 

data work.want;
   set work.loans(drop= LoanTransfer rename=(UniqueId = ExpectedId));

   length 
      UniqueId $ 24
      changedClient changedLoan changedOpenQuarter 8
   ;

   drop changed: lastQuarter;

   retain UniqueId;

   changedClient = lag(Client) ^= Client;
   changedOpenQuarter = lag(OpenQuarter) ^= OpenQuarter;
   changedLoan = lag(Loan) ^= Loan;
   lastQuarter = lag(Quarter);

   if _n_ = 1 or changedClient or changedOpenQuarter or (lastQuarter >= Quarter) then do;
      UniqueId = cats(Loan, '-', Lender);
   end;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1040 views
  • 1 like
  • 3 in conversation