Good evening,
I am trying to create an identifier that will uniquely identify loans in my data over time. The data is quarterly, and a person can be paying multiple loans taken out at different points in time (multiple OpenQuarter values in sample data, below). Currently, a loan is identified uniquely as a combination of person ID, loan ID, and lender ID.
The problem is that loans are often transferred between lenders; loans can change either lender ID in the process (e.g. observations #5 and #23 in the sample data) or both loan ID and lender ID (e.g. observations #17 and #32 in the sample data). I can figure out when a loan is transferred because it will show up as a "new" loan-lender combination for the person, but the number of loans for that person that were originated in a given quarter will stay the same relative to the previous quarter, the sum of "HiCredit" (maximum loan amount) for loans originated in the same quarter will stay the same, and the total balance for loans originated in the same quarter is very similar ("fuzzy" match).
What I am after is the last column, "UniqueID" - a way to track the loan over time. I don't care if this unique ID is the first loan-lender combination, or the last one (after transfer), or a completely new number, as long as I can then properly calculate leads/lags of balance and other fields (many others not shown in data) for a particular loan.
Any ideas for how I could accomplish this would be very much appreciated!
Sample data below:
Would it be helpful to construct a unique ID from a combination of person ID, loan ID, lender ID and date of loan initiation?
Let's see if i understood, what you have and need.
LoanTransfer is in your dataset, so you know if that loan has been transferred.
You can't use Loan and Lender in your identifier, because when they change it is impossible to create the initially generated identifier for that loan - in obs = 5 you don't have Lender = Blue. Relying on Client + OpenQuarter seems to be impossible, too, because a Client could have more than one loan opened in the same quarter, right?
If the loans are sorted by Quarter than the following step seems to solve the problem:
data work.want;
   set work.loans(drop= LoanTransfer rename=(UniqueId = ExpectedId));
   length 
      UniqueId $ 24
      changedClient changedLoan changedOpenQuarter 8
   ;
   drop changed: lastQuarter;
   retain UniqueId;
   changedClient = lag(Client) ^= Client;
   changedOpenQuarter = lag(OpenQuarter) ^= OpenQuarter;
   changedLoan = lag(Loan) ^= Loan;
   lastQuarter = lag(Quarter);
   if _n_ = 1 or changedClient or changedOpenQuarter or (lastQuarter >= Quarter) then do;
      UniqueId = cats(Loan, '-', Lender);
   end;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.
