I am not exactly sure how to describe this, but I have a panel dataset of loans that gives the loan age by month. Occasionally there are gaps, and I want to record in a new variable the total number of gaps. For instance, see the data below:
I would like to save, by loan_id, the different gaps in x_loan_age. So I would like to save that there is a 38 month starting in 200508 (YYYYMM variable), a 9 month gap starting in 200801, and a 15 month gap starting in 200810.
Ideally I would create variables called "GAP1", "GAP1START", "GAP2", "GAP2START", "GAP3", and "GAP3START".
The output would then be:
38, 200508, 9, 200801, 15, 200810.
There are tons of loans in this dataset, so I want to do this by each loan_id.
Basically, this is a dataset of when people didn't pay their loans and the gaps are when the person actually paid. I am trying to discern when they were paying among these missed payments.
If this can't be handled by sas, alternatively I would just like to count the cumulative "gap" space. So for the above loan, returning a variable called "GAPS" that = 38+9+15.
Attached is a SAS dataset of the data I pasted. Thanks!
In the absence of proc timeseries, here is a two-data-step solution:
data gaps (keep=loan_id yyyymm gap ngaps_this_loan);
set have (keep=loan_id ) end=end_of_have;
by loan_id;
merge have have (firstobs=2 keep=x_loan_age rename=(x_loan_age=nxt_age));
if first.loan_id then ngaps_this_loan=0; /*N of gaps for this loan */
gap=nxt_age-x_loan_age;
if last.loan_id=0 and gap^=1;
ngaps_this_loan+1;
retain max_ngap; /*Track the maximum number of gaps */
max_ngap=max(max_ngap,ngaps_this_loan);
if dif(max_ngap) then call symput('max_ngap',cats(max_ngap));
run;
data want (keep=loan_id gapstart: gapsize:);
do g=1 by 1 until (last.loan_id);
set gaps;
by loan_id;
array gapstart{&max_ngap};
array gapsize{&max_ngap};
gapstart{g}=yyyymm;
gapsize{g}=gap;
end;
run;
Is this true timeseries data and do you have SAS/ETS (for PROC TIMESERIES)? See this article from @gsvolba for some ideas and a SAS macro that may help.
In the absence of proc timeseries, here is a two-data-step solution:
data gaps (keep=loan_id yyyymm gap ngaps_this_loan);
set have (keep=loan_id ) end=end_of_have;
by loan_id;
merge have have (firstobs=2 keep=x_loan_age rename=(x_loan_age=nxt_age));
if first.loan_id then ngaps_this_loan=0; /*N of gaps for this loan */
gap=nxt_age-x_loan_age;
if last.loan_id=0 and gap^=1;
ngaps_this_loan+1;
retain max_ngap; /*Track the maximum number of gaps */
max_ngap=max(max_ngap,ngaps_this_loan);
if dif(max_ngap) then call symput('max_ngap',cats(max_ngap));
run;
data want (keep=loan_id gapstart: gapsize:);
do g=1 by 1 until (last.loan_id);
set gaps;
by loan_id;
array gapstart{&max_ngap};
array gapsize{&max_ngap};
gapstart{g}=yyyymm;
gapsize{g}=gap;
end;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.