Hi All,
I have a Data Set which kind of looks like below -
Loan_ID | Sys_Date | Payment | Scheduled Payment | Flag |
4 | 8-Feb-18 | $135.96 | $135.96 | Y |
4 | 22-Feb-18 | $135.96 | $135.96 | Y |
4 | 8-Mar-18 | $135.96 | $135.96 | Y |
4 | 22-Mar-18 | $135.96 | $135.96 | Y |
4 | 5-Apr-18 | $135.96 | $135.96 | Y |
4 | 19-Apr-18 | $135.96 | $135.96 | Y |
4 | 3-May-18 | $135.96 | $135.96 | Y |
4 | 17-May-18 | $135.96 | $135.96 | Y |
4 | 31-May-18 | $135.96 | $135.96 | Y |
4 | 14-Jun-18 | $135.96 | $135.96 | Y |
4 | 28-Jun-18 | $135.96 | $135.96 | Y |
4 | 12-Jul-18 | $135.96 | $135.96 | Y |
4 | 26-Jul-18 | $4,702.18 | $135.96 | Y |
5 | 15-Feb-18 | $0.00 | $137.52 | N |
5 | 1-Mar-18 | $137.52 | $137.52 | Y |
5 | 15-Mar-18 | $137.52 | $137.52 | Y |
5 | 29-Mar-18 | $137.52 | $137.52 | Y |
5 | 12-Apr-18 | $0.00 | $137.52 | N |
8189 | 17-Aug-18 | $51.44 | $51.44 | Y |
8189 | 31-Aug-18 | $51.44 | $51.44 | Y |
8189 | 14-Sep-18 | $0.00 | $51.44 | N |
8189 | 28-Sep-18 | $51.44 | $51.44 | Y |
8189 | 12-Oct-18 | $0.00 | $51.44 | N |
8189 | 26-Oct-18 | $0.00 | $51.44 | N |
8189 | 9-Nov-18 | $51.44 | $51.44 | Y |
8189 | 15-Nov-18 | $202.88 | $51.44 | Y |
8189 | 23-Nov-18 | $51.44 | $51.44 | Y |
8189 | 7-Dec-18 | $51.44 | $51.44 | N |
8189 | 21-Dec-18 | $51.44 | $51.44 | N |
9999 | 17-Aug-18 | $99.30 | $99.30 | Y |
9999 | 31-Aug-18 | $0.00 | $99.30 | N |
9999 | 5-Sep-18 | $99.30 | $99.30 | Y |
9999 | 14-Sep-18 | $99.30 | $99.30 | Y |
9999 | 28-Sep-18 | $0.00 | $99.30 | N |
9999 | 29-Sep-18 | $99.30 | $99.30 | Y |
9999 | 12-Oct-18 | $99.30 | $99.30 | Y |
9999 | 16-Oct-18 | $49.30 | $99.30 | Y |
9999 | 25-Oct-18 | ($149.30) | $99.30 | N |
9999 | 26-Oct-18 | $99.30 | $99.30 | Y |
1111 | 28-Sep-18 | $50.00 | $50.00 | Y |
1111 | 29-Sep-18 | $50.00 | $50.00 | Y |
1111 | 12-Oct-18 | $30.00 | $50.00 | N |
1111 | 16-Oct-18 | $0.00 | $50.00 | N |
1111 | 25-Oct-18 | $50.00 | $50.00 | Y |
1111 | 26-Oct-18 | $0.00 | $50.00 | N |
Loan_ID,
Sys_Date,
Payment (which is the Paid Amount),
Scheduled Payment (which is the scheduled Payment amount for the Loan),
Flag (Y = Successful Payment; N = Unsuccessful Payment).
From the data set below I only need the ID's which have 3 consecutive payments (given the condition that PAYMENT >= SCHEDULED_PAYMENT)
For Example from the above data I only need to keep the following ID's (Loan_id 4,5 & 8189)
I need to keep loan_ID 4 as it meets 3 consecutive Payments (I.E. Each Payment is >= Scheduled Payment and as per Y in the flag field they were all Successful)
I need to keep loan_ID 5 AS THIS ALSO HAS 3 CONSECUTIVE PAYMENTS (i.e. Three times FLAG = Y and payment >= scheduled payment)
I need to keep 8189 this also has an instance of three consecutive payments (i.e. i.e. Three times FLAG = Y and payment >= scheduled payment)
I need to EXCLUDE 9999 as this does not meet my criteria, you can see that even though there are times FLAG = Y but one of the amounts in Payments ($49.30) is < Scheduled Payment ($99.30) therefore this has to be excluded.
i NEED TO EXCLUDE 1111 AS IT DOES NOT MEET THE 3 Consecutive payments criteria
I have tried different variations but I am unable to get to the final result which should only Show
Loan_ID |
4 |
5 |
8189 |
Any help in solving this dilemma is much appreciated!
You should provide an example of what you expect the actual output data set to look like given that input data. I am not sure if you are expecting 3 records or many more.
Also, can you try to clean up your definition of "consecutive". Listing examples doesn't necessarily provide the rule(s) needed.
You say:
From the data set below I only need the ID's which have 3 consecutive payments
(given the condition that PAYMENT >= SCHEDULED_PAYMENT)
but then wander off with Flag values in the examples. So you should data the flag condition a bit more explicitly.
Thanks ballardw for the response -
The actual output data just needs to be the Loan_ID that meets the condition -
3 Consecutive Payments where each payment is >= scheduled payment
I just created the FLAG (i.e. Y or N) so that it would be easier to understand.
Basically I only need to keep the Loan_ID that has 3 consecutive Y and for each of the 3 the Payment should be >= scheduled payment
So my output should be only the Loan_id'S that satisfy the above and for this data set it is 4,5,8189
Hope this helps
Below (untested) code should do the job. Source table "have" must be pre-sorted by loan_id sysdate.
data want(keep=loan_id);
set have;
by loan_id sys_date;
if first.loan_id then count=0;
if count<3 then
do;
if flag='Y' and payment>=scheduled_payment then count+1;
else count=0;
end;
if last.loan_id and count=3 then output;
run;
No data is given, so this untested, but should do roughly what you want:
data WANT ;
COUNT=0;
do until(last.ID);
set HAVE;
by ID notsorted;
if FLAG='Y' & PAYMENT >= SCHEDULED then COUNT+1;
else if COUNT <3 then COUNT=0;
end;
do until(last.ID);
set HAVE;
by ID notsorted;
if COUNT >= 3 then output;
end;
run;
[Edit: code altered slightly]
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.