BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nbs98495
Calcite | Level 5

I am not exactly sure how to describe this, but I have a panel dataset of loans that gives the loan age by month. Occasionally there are gaps, and I want to record in a new variable the total number of gaps. For instance, see the data below:

 

nbs98495_0-1616700402463.png

 

I would like to save, by loan_id, the different gaps in x_loan_age. So I would like to save that there is a 38 month starting in 200508 (YYYYMM variable), a 9 month gap starting in 200801, and a 15 month gap starting in 200810.

 

Ideally I would create variables called "GAP1", "GAP1START", "GAP2", "GAP2START", "GAP3", and "GAP3START".

 

The output would then be:

38, 200508, 9, 200801, 15, 200810.

 

There are tons of loans in this dataset, so I want to do this by each loan_id.

 

Basically, this is a dataset of when people didn't pay their loans and the gaps are when the person actually paid. I am trying to discern when they were paying among these missed payments. 

 

If this can't be handled by sas, alternatively I would just like to count the cumulative "gap" space. So for the above loan, returning a variable called "GAPS" that = 38+9+15. 

 

Attached is a SAS dataset of the data I pasted. Thanks!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

In the absence of proc timeseries, here is a two-data-step solution:

 


data gaps (keep=loan_id yyyymm gap ngaps_this_loan);
  set have (keep=loan_id ) end=end_of_have;
  by loan_id;
  merge have have (firstobs=2 keep=x_loan_age rename=(x_loan_age=nxt_age));

  if first.loan_id then ngaps_this_loan=0;  /*N of gaps for this loan */
  gap=nxt_age-x_loan_age;
  if last.loan_id=0 and gap^=1;
  ngaps_this_loan+1;

  retain max_ngap;  /*Track the maximum number of gaps */
  max_ngap=max(max_ngap,ngaps_this_loan);
  if dif(max_ngap) then call symput('max_ngap',cats(max_ngap));
run;

data want (keep=loan_id gapstart: gapsize:);
  do g=1 by 1 until (last.loan_id);
    set gaps;
    by loan_id;
    array gapstart{&max_ngap};
    array gapsize{&max_ngap};
    gapstart{g}=yyyymm;
    gapsize{g}=gap;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
ChrisHemedinger
Community Manager

Is this true timeseries data and do you have SAS/ETS (for PROC TIMESERIES)? See this article from @gsvolba for some ideas and a SAS macro that may help.

SAS For Dummies 3rd Edition! Check out the new edition, covering SAS 9.4, SAS Viya, and all of the modern ways to use SAS!
mkeintz
PROC Star

In the absence of proc timeseries, here is a two-data-step solution:

 


data gaps (keep=loan_id yyyymm gap ngaps_this_loan);
  set have (keep=loan_id ) end=end_of_have;
  by loan_id;
  merge have have (firstobs=2 keep=x_loan_age rename=(x_loan_age=nxt_age));

  if first.loan_id then ngaps_this_loan=0;  /*N of gaps for this loan */
  gap=nxt_age-x_loan_age;
  if last.loan_id=0 and gap^=1;
  ngaps_this_loan+1;

  retain max_ngap;  /*Track the maximum number of gaps */
  max_ngap=max(max_ngap,ngaps_this_loan);
  if dif(max_ngap) then call symput('max_ngap',cats(max_ngap));
run;

data want (keep=loan_id gapstart: gapsize:);
  do g=1 by 1 until (last.loan_id);
    set gaps;
    by loan_id;
    array gapstart{&max_ngap};
    array gapsize{&max_ngap};
    gapstart{g}=yyyymm;
    gapsize{g}=gap;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
nbs98495
Calcite | Level 5
This worked for me! Thank you! I will check out the PROC TIMESERIES too.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1125 views
  • 3 likes
  • 3 in conversation