I am not sure how exactly to explain this.
I have a dataset of loans with ids, monthly indicators of time, and monthly indicators of how much is owed.
I want to count how many times a given loan occurs in the dataset for the previous 6 months. I also want to calculate the difference between the current observation's amount owed and the mean amount owed over those six months.
Here's what the data looks like. I want to calculate pmts and omb_n_change.
loan_id | calendar_time | omb_n | pmts | omb_n_change |
1 | 96 | 165836 | 0 | 0 |
1 | 97 | 165646 | 1 | 165646 - 165836 |
1 | 98 | 165646 | 2 | 165646 - average of (165646,165836) |
1 | 99 | 165646 | 3 | 165646 - average of (165646,165646,165836) |
1 | 110 | 161957 | 0 | 0 |
1 | 111 | 161748 | 1 | 161748 - 161957 |
1 | 112 | 161537 | 2 | 161537 - average of (161748,161957) |
1 | 113 | 161537 | 3 | 161537 - average of (161537,161748,161957) |
1 | 114 | 161537 | 4 | 161537 - average of (161537,161537,161748,161957) |
1 | 115 | 161326 | 5 | 161326 - average of (161537,161537,161537,161748,161957) |
1 | 116 | 161326 | 6 | 161326 - average of (161326,161537,161537,161537,161748,161957) |
1 | 117 | 161113 | 6 | 161113 - average of (161326,161326,161537,161537,161537,161748) |
Loan_id is an id for each loan, calendar_time indicates what month that payment was made, and omb_n indicates how much was still owed that month. So basically I want to count how many times loan_id "1" occurred in my data over calendar_time - 6 as well as the mean value of omb_n over that time period (then take the difference between omb_n and that mean).
I tried using a hash iterator but am a rookie with those so I got lost before I could really get anything coherent.
Assuming your table is not big. Otherwise, you need Hash Table.
data have;
input loan_id calendar_time omb_n;
cards;
1 96 165836
1 97 165646
1 98 165646
1 99 165646
1 110 161957
1 111 161748
1 112 161537
1 113 161537
1 114 161537
1 115 161326
1 116 161326
1 117 161113
;
proc sql;
create table want as
select *,
(select count(*) from have where loan_id=a.loan_id and calendar_time between a.calendar_time-1 and a.calendar_time-6) as pmts,
(select mean(omb_n) from have where loan_id=a.loan_id and calendar_time between a.calendar_time-1 and a.calendar_time-6) as average,
omb_n-calculated average as omb_n_change
from have as a
;
quit;
Your calendar_time 117 is more than 6 months from 110. Your illustration for 117 violates 6-month interval.
Title, description and table don't match. I don't see "calculate ... on how many times that ID is present in data". Please explain. And when you do that, post data in usable form and show exactly, what you expect as result.
Assuming your table is not big. Otherwise, you need Hash Table.
data have;
input loan_id calendar_time omb_n;
cards;
1 96 165836
1 97 165646
1 98 165646
1 99 165646
1 110 161957
1 111 161748
1 112 161537
1 113 161537
1 114 161537
1 115 161326
1 116 161326
1 117 161113
;
proc sql;
create table want as
select *,
(select count(*) from have where loan_id=a.loan_id and calendar_time between a.calendar_time-1 and a.calendar_time-6) as pmts,
(select mean(omb_n) from have where loan_id=a.loan_id and calendar_time between a.calendar_time-1 and a.calendar_time-6) as average,
omb_n-calculated average as omb_n_change
from have as a
;
quit;
This is perfect, thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.