Hi All,
I have a Data Set which kind of looks like below -
Loan_ID | Sys_Date | Payment | Scheduled Payment | Flag |
4 | 8-Feb-18 | $135.96 | $135.96 | Y |
4 | 22-Feb-18 | $135.96 | $135.96 | Y |
4 | 8-Mar-18 | $135.96 | $135.96 | Y |
4 | 22-Mar-18 | $135.96 | $135.96 | Y |
4 | 5-Apr-18 | $135.96 | $135.96 | Y |
4 | 19-Apr-18 | $135.96 | $135.96 | Y |
4 | 3-May-18 | $135.96 | $135.96 | Y |
4 | 17-May-18 | $135.96 | $135.96 | Y |
4 | 31-May-18 | $135.96 | $135.96 | Y |
4 | 14-Jun-18 | $135.96 | $135.96 | Y |
4 | 28-Jun-18 | $135.96 | $135.96 | Y |
4 | 12-Jul-18 | $135.96 | $135.96 | Y |
4 | 26-Jul-18 | $4,702.18 | $135.96 | Y |
5 | 15-Feb-18 | $0.00 | $137.52 | N |
5 | 1-Mar-18 | $137.52 | $137.52 | Y |
5 | 15-Mar-18 | $137.52 | $137.52 | Y |
5 | 29-Mar-18 | $137.52 | $137.52 | Y |
5 | 12-Apr-18 | $0.00 | $137.52 | N |
8189 | 17-Aug-18 | $51.44 | $51.44 | Y |
8189 | 31-Aug-18 | $51.44 | $51.44 | Y |
8189 | 14-Sep-18 | $0.00 | $51.44 | N |
8189 | 28-Sep-18 | $51.44 | $51.44 | Y |
8189 | 12-Oct-18 | $0.00 | $51.44 | N |
8189 | 26-Oct-18 | $0.00 | $51.44 | N |
8189 | 9-Nov-18 | $51.44 | $51.44 | Y |
8189 | 15-Nov-18 | $202.88 | $51.44 | Y |
8189 | 23-Nov-18 | $51.44 | $51.44 | Y |
8189 | 7-Dec-18 | $51.44 | $51.44 | N |
8189 | 21-Dec-18 | $51.44 | $51.44 | N |
9999 | 17-Aug-18 | $99.30 | $99.30 | Y |
9999 | 31-Aug-18 | $0.00 | $99.30 | N |
9999 | 5-Sep-18 | $99.30 | $99.30 | Y |
9999 | 14-Sep-18 | $99.30 | $99.30 | Y |
9999 | 28-Sep-18 | $0.00 | $99.30 | N |
9999 | 29-Sep-18 | $99.30 | $99.30 | Y |
9999 | 12-Oct-18 | $99.30 | $99.30 | Y |
9999 | 16-Oct-18 | $49.30 | $99.30 | Y |
9999 | 25-Oct-18 | ($149.30) | $99.30 | N |
9999 | 26-Oct-18 | $99.30 | $99.30 | Y |
1111 | 28-Sep-18 | $50.00 | $50.00 | Y |
1111 | 29-Sep-18 | $50.00 | $50.00 | Y |
1111 | 12-Oct-18 | $30.00 | $50.00 | N |
1111 | 16-Oct-18 | $0.00 | $50.00 | N |
1111 | 25-Oct-18 | $50.00 | $50.00 | Y |
1111 | 26-Oct-18 | $0.00 | $50.00 | N |
Loan_ID,
Sys_Date,
Payment (which is the Paid Amount),
Scheduled Payment (which is the scheduled Payment amount for the Loan),
Flag (Y = Successful Payment; N = Unsuccessful Payment).
From the data set below I only need the ID's which have 3 consecutive payments (given the condition that PAYMENT >= SCHEDULED_PAYMENT)
For Example from the above data I only need to keep the following ID's (Loan_id 4,5 & 8189)
I need to keep loan_ID 4 as it meets 3 consecutive Payments (I.E. Each Payment is >= Scheduled Payment and as per Y in the flag field they were all Successful)
I need to keep loan_ID 5 AS THIS ALSO HAS 3 CONSECUTIVE PAYMENTS (i.e. Three times FLAG = Y and payment >= scheduled payment)
I need to keep 8189 this also has an instance of three consecutive payments (i.e. i.e. Three times FLAG = Y and payment >= scheduled payment)
I need to EXCLUDE 9999 as this does not meet my criteria, you can see that even though there are times FLAG = Y but one of the amounts in Payments ($49.30) is < Scheduled Payment ($99.30) therefore this has to be excluded.
i NEED TO EXCLUDE 1111 AS IT DOES NOT MEET THE 3 Consecutive payments criteria
I have tried different variations but I am unable to get to the final result which should only Show
Loan_ID |
4 |
5 |
8189 |
Any help in solving this dilemma is much appreciated!
You should provide an example of what you expect the actual output data set to look like given that input data. I am not sure if you are expecting 3 records or many more.
Also, can you try to clean up your definition of "consecutive". Listing examples doesn't necessarily provide the rule(s) needed.
You say:
From the data set below I only need the ID's which have 3 consecutive payments
(given the condition that PAYMENT >= SCHEDULED_PAYMENT)
but then wander off with Flag values in the examples. So you should data the flag condition a bit more explicitly.
Thanks ballardw for the response -
The actual output data just needs to be the Loan_ID that meets the condition -
3 Consecutive Payments where each payment is >= scheduled payment
I just created the FLAG (i.e. Y or N) so that it would be easier to understand.
Basically I only need to keep the Loan_ID that has 3 consecutive Y and for each of the 3 the Payment should be >= scheduled payment
So my output should be only the Loan_id'S that satisfy the above and for this data set it is 4,5,8189
Hope this helps
Below (untested) code should do the job. Source table "have" must be pre-sorted by loan_id sysdate.
data want(keep=loan_id);
set have;
by loan_id sys_date;
if first.loan_id then count=0;
if count<3 then
do;
if flag='Y' and payment>=scheduled_payment then count+1;
else count=0;
end;
if last.loan_id and count=3 then output;
run;
No data is given, so this untested, but should do roughly what you want:
data WANT ;
COUNT=0;
do until(last.ID);
set HAVE;
by ID notsorted;
if FLAG='Y' & PAYMENT >= SCHEDULED then COUNT+1;
else if COUNT <3 then COUNT=0;
end;
do until(last.ID);
set HAVE;
by ID notsorted;
if COUNT >= 3 then output;
end;
run;
[Edit: code altered slightly]
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.