For example, I have 3 months of data stacking like this, and this is what I want to ask. I want to create a new data set based on this one but in which I want to delete accounts having balance constantly below 100 threshold since month 1 and forward. And I would like to repeat the same process for individual month #2, #3 until the end of of my data set. With that purpose, the code should only delete account 1D, 2A and 2D for month 1 and account 4A for month 2. This is just an example represented what I want to do. My actual data is way bigger than this. It has at least 50 months of data so I am looking for a MACRO SAS DO loop that can help me perform what I would like to. Thanks a lot.
Account Number | Month | Balance |
1A | 1 | 90 |
1B | 1 | 500 |
1C | 1 | 600 |
1D | 1 | 80 |
2A | 1 | 300 |
2B | 1 | 200 |
2C | 1 | 75 |
2D | 1 | 780 |
3A | 1 | 650 |
3B | 1 | 90 |
3C | 1 | 336 |
3D | 1 | 296 |
1A | 2 | 250 |
1B | 2 | 525 |
1C | 2 | 630 |
1D | 2 | 60 |
2A | 2 | 90 |
2B | 2 | 95 |
2C | 2 | 125 |
2D | 2 | 80 |
3A | 2 | 300 |
3B | 2 | 145 |
3C | 2 | 300 |
3D | 2 | 128 |
4A | 2 | 125 |
4B | 2 | 150 |
1A | 3 | 45 |
1B | 3 | 130 |
1C | 3 | 150 |
1D | 3 | 50 |
2A | 3 | 50 |
2B | 3 | 157 |
2C | 3 | 100 |
2D | 3 | 60 |
3A | 3 | 150 |
3B | 3 | 300 |
3C | 3 | 145 |
3D | 3 | 530 |
4A | 3 | 90 |
4B | 3 | 115 |
Then please mark the post that did it as the solution.
No macro needed. Something like this should work
/* UNTESTED CODE */
proc summary data=have;
class account_number;
var balance;
output out=maxbal max=max_balance;
run;
proc sort data=have;
by account_number;
run;
/* Delete any account_number that has max_balance < 100 */
data want;
merge have maxbal;
by account_number;
if max_balance<100 then delete;
run;
Your rule does not make sense. Why would you delete A4 for month 2, when it has a balance of 125, which is > 100 AFAIK?
Only account 1D satisfies your rule for deleting.
4A for month 2 before in month 3, that account dropped below 100 already, so I want to delete it.
That's not your rule:
I quote from your initial post:
delete accounts having balance constantly below 100 threshold since month 1
(emphasis by me)
Please make up your mind and post a clear and correct rule.
Sorry If I am not that clear. What I want is that for month 1, I want to delete all accounts having balance constantly below 100 since month 1. But then in month 2, I want to delete all accounts having balance constantly below 100 since month 2, etc. Is it clear now?
No. 4A is not constantly below 100, as it has a value of 125 in month 2.
Sorry I do apologize for that example. but pretty much this is what I wanna do. For individual snapshot, I want to delete accounts that staying below 100 since that particular snapshot
So, apparently I have misunderstood the requirements, and so the code I gave is not correct.
let me make sure I have the requirements understood properly now.
You want to delete accounts where the values are always under 100 after the first month that the account appears. The first month the account appears could be over 100, and we still delete the account if all subsequent months are under 100.
Is that a correct understanding?
I want to delete accounts where the values are always under 100 after the month its value starts dropping below 100. So for account 4A example, since its value is above 100 in month 2, it should be kept in month 2 snapshot. But if its value keeps staying below 100 since month 3 (where its value dropped below 100), it should be deleted from the dataset since month 3 snapshot and in later subsequent months. So in your final data, you still see account 4A in month 2, but not in month 3 and forward. Hopefully this is clear. Please let me know if I need to clarify it more.
I want to delete accounts where the values are always under 100 after the month its value starts dropping below 100.
This is different than what I thought and different than previous explanations. So, if we have a longer time series and a certain account has values
120
120
100
80
80
60
70
we would include this account in month 1, include this account in month 2, include this account in month 3, but not include the account after month 3.
If the sequence of data is
120
120
100
80
80
60
120
then the account is always included. Is that right?
I will try my best here. For your example,
120
120
100
80
80
60
70
For give those value in month #1, if the last 4 account have values going back above 100 in month #5, I want to keep them in month 1, month 2, #3, #4 and #5 as long as its value above 100. but then if one of those starts dropping below 100 (for example in month #6) and constantly stay below until the end of series, then it should be deleted since month #6. Is that clear?
Not clear at all.
Why don't we delete the account at month 4? All the values are below 100 at month 4 and beyond.
but if I do that, it may delete some accounts that may go above 100 in later months. cause if it goes above 100 in later months, I dont want to delete them in current months unless its value is always below 100 since current month.
cause I want to model attrition rate. We dont care about accounts with balance below 100 dollars. But if we just simply deleted account with below 100 for each particular month, it will overstate attrition rate for that month. Because when accounts dropped below 100, it does not actually meant it leaves the bank. But right now by deleting those, it is pretty much I assume they leave the bank for that month (account balance < 100) but actually they do not.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.