BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nbora
Calcite | Level 5

Hi,

 

I'd like to create segments for each distinct value in a given column based on a moving time window of 36 months. What I mean by that is that for the given data below, I need to check the time distances between each date of col_b for each distinct value of col_a and where the difference shorter than 36 months I want to create a segment. 

 

data have;
     input col_a col_b :date9.;
     format col_b date9.;
     datalines;
     123 30SEP2005
     123 31MAY2009
     123 30APR2014
123 30APR2014 456 31AUG2010 456 31JUL2012 456 31OCT2017 456 31OCT2021 456 31DEC2022 456 31JAN2024 ; run;

 Like so:

 

data want;
     input col_a col_b :date9. col_c $;
     format col_b date9.;
     datalines;
     123 30SEP2005 SEG_1
     123 31MAY2009 SEG_2
     123 31AUG2010 SEG_2
     123 30APR2011 SEG_2
     456 31AUG2010 SEG_1
     456 31JUL2012 SEG_1
     456 31OCT2017 SEG_2
     456 31OCT2021 SEG_3
     456 31DEC2022 SEG_3
     456 31JAN2024 SEG_3
;
run;

The key point here is I need to start checking time differences by taking the earliest date as reference for each col_a value. If the time difference between the first earliest and second earliest dates are longer than 36 months than I need to take the second earliest date as reference for further checks. I hope I could articulate the problem clear enough.

Any idea for how to approach/solve this problem is appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

The col_b dates in your have and want data sets do not match. See observation 3.

 

I think this does what you want though

 

data want;
   set have;
   by col_a;

   lag_date = ifn(first.col_a, ., lag(col_b));
   dif_months = intck('month', lag_date, col_b);

   if first.col_a          then col_c = 1;
   else if dif_months > 36 then col_c + 1;

   format lag_date date9.;
   drop lag_date dif_months;
run;

 

Result:

 

col_a  col_b      col_c
123    30SEP2005  1
123    31MAY2009  2
123    30APR2014  3
123    30APR2014  3
456    31AUG2010  1
456    31JUL2012  1
456    31OCT2017  2
456    31OCT2021  3
456    31DEC2022  3
456    31JAN2024  3

View solution in original post

1 REPLY 1
PeterClemmensen
Tourmaline | Level 20

The col_b dates in your have and want data sets do not match. See observation 3.

 

I think this does what you want though

 

data want;
   set have;
   by col_a;

   lag_date = ifn(first.col_a, ., lag(col_b));
   dif_months = intck('month', lag_date, col_b);

   if first.col_a          then col_c = 1;
   else if dif_months > 36 then col_c + 1;

   format lag_date date9.;
   drop lag_date dif_months;
run;

 

Result:

 

col_a  col_b      col_c
123    30SEP2005  1
123    31MAY2009  2
123    30APR2014  3
123    30APR2014  3
456    31AUG2010  1
456    31JUL2012  1
456    31OCT2017  2
456    31OCT2021  3
456    31DEC2022  3
456    31JAN2024  3

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 696 views
  • 1 like
  • 2 in conversation