I am not exactly sure how to describe this, but I have a panel dataset of loans that gives the loan age by month. Occasionally there are gaps, and I want to record in a new variable the total number of gaps. For instance, see the data below:
I would like to save, by loan_id, the different gaps in x_loan_age. So I would like to save that there is a 38 month starting in 200508 (YYYYMM variable), a 9 month gap starting in 200801, and a 15 month gap starting in 200810.
Ideally I would create variables called "GAP1", "GAP1START", "GAP2", "GAP2START", "GAP3", and "GAP3START".
The output would then be:
38, 200508, 9, 200801, 15, 200810.
There are tons of loans in this dataset, so I want to do this by each loan_id.
Basically, this is a dataset of when people didn't pay their loans and the gaps are when the person actually paid. I am trying to discern when they were paying among these missed payments.
If this can't be handled by sas, alternatively I would just like to count the cumulative "gap" space. So for the above loan, returning a variable called "GAPS" that = 38+9+15.
Attached is a SAS dataset of the data I pasted. Thanks!
In the absence of proc timeseries, here is a two-data-step solution:
data gaps (keep=loan_id yyyymm gap ngaps_this_loan);
set have (keep=loan_id ) end=end_of_have;
by loan_id;
merge have have (firstobs=2 keep=x_loan_age rename=(x_loan_age=nxt_age));
if first.loan_id then ngaps_this_loan=0; /*N of gaps for this loan */
gap=nxt_age-x_loan_age;
if last.loan_id=0 and gap^=1;
ngaps_this_loan+1;
retain max_ngap; /*Track the maximum number of gaps */
max_ngap=max(max_ngap,ngaps_this_loan);
if dif(max_ngap) then call symput('max_ngap',cats(max_ngap));
run;
data want (keep=loan_id gapstart: gapsize:);
do g=1 by 1 until (last.loan_id);
set gaps;
by loan_id;
array gapstart{&max_ngap};
array gapsize{&max_ngap};
gapstart{g}=yyyymm;
gapsize{g}=gap;
end;
run;
Is this true timeseries data and do you have SAS/ETS (for PROC TIMESERIES)? See this article from @gsvolba for some ideas and a SAS macro that may help.
In the absence of proc timeseries, here is a two-data-step solution:
data gaps (keep=loan_id yyyymm gap ngaps_this_loan);
set have (keep=loan_id ) end=end_of_have;
by loan_id;
merge have have (firstobs=2 keep=x_loan_age rename=(x_loan_age=nxt_age));
if first.loan_id then ngaps_this_loan=0; /*N of gaps for this loan */
gap=nxt_age-x_loan_age;
if last.loan_id=0 and gap^=1;
ngaps_this_loan+1;
retain max_ngap; /*Track the maximum number of gaps */
max_ngap=max(max_ngap,ngaps_this_loan);
if dif(max_ngap) then call symput('max_ngap',cats(max_ngap));
run;
data want (keep=loan_id gapstart: gapsize:);
do g=1 by 1 until (last.loan_id);
set gaps;
by loan_id;
array gapstart{&max_ngap};
array gapsize{&max_ngap};
gapstart{g}=yyyymm;
gapsize{g}=gap;
end;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.