Hello,
I have a large dataset pulling account balances for multiple years at a time. This query is all nested in a macro so that I can pull multiple years at the same time.
In the below example, ID 1 had a change in their loan amount, so I want to pull the most recent loan amount for that account as their 'current loan amount'. But, I also need to figure out the average monthly loan amount. So I want to sum their loan amounts for the year, but the current loan amount for that account should be the most recent one (2022-12). The way I've written it in my code, it will only pull the loan amount for 2022-12 and give me blanks for every other month, and not allow me to sum the loan amounts for the year.
Example:
Existing Dataset
Account ID
Month
Loan Amount
Balance
1
2022-01
$10,000
$9,000
1
2022-04
$10,000
$8,500
1
2022-08
$10,000
$7,500
1
2022-12
$15,000
$10,000
2
2022-01
$5,000
$5,000
2
2022-04
$5,000
$5,000
2
2022-08
$5,000
$5,000
2
2022-12
$5,000
$5,000
Want Dataset:
Account ID
Year
Most Recent Loan Amt
Total Loan Amounts
Total balances
1
2022
$15,000
$45,000
$35,000
2
2022
$5,000
$20,000
$20,000
proc sql;
create table data_want_&yyyy. as
select acct_id
, year
, open_month
, case when period = (&yyyy.*100+12) then loan_amount end as final_loan_amount
, sum(case when period = (&yyyy.*100+12) then total_payments end) as total_payments_&yyyy.
, sum(total_balance) as total_balances_&yyyy.
, sum(loan_amt) as loan_amt_&yyyy.
, sum(fee_late_net_amt) as net_late_fee_amt_&yyyy.
, sum(fee_nsf_net_amt) as net_nsf_fee_amt_&yyyy.
, sum(fee_cash_net_amt) as fee_cash_net_amt_&yyyy.
, sum(net_interest_amt) as net_interest_&yyyy.
, sum(net_revenue) as net_revenue_&yyyy.
, count(distinct acct_id) as num_accts_&yyyy.
from data_have
group by 1,2,3,4
order by ac_i, period;
quit;
Thank you in advance! Let me know if I can clarify more, this was challenging to explain.
... View more