BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
nbora
Calcite | Level 5

Hi,

 

I'd like to create segments for each distinct value in a given column based on a moving time window of 36 months. What I mean by that is that for the given data below, I need to check the time distances between each date of col_b for each distinct value of col_a and where the difference shorter than 36 months I want to create a segment. 

 

data have;
     input col_a col_b :date9.;
     format col_b date9.;
     datalines;
     123 30SEP2005
     123 31MAY2009
     123 30APR2014
123 30APR2014 456 31AUG2010 456 31JUL2012 456 31OCT2017 456 31OCT2021 456 31DEC2022 456 31JAN2024 ; run;

 Like so:

 

data want;
     input col_a col_b :date9. col_c $;
     format col_b date9.;
     datalines;
     123 30SEP2005 SEG_1
     123 31MAY2009 SEG_2
     123 31AUG2010 SEG_2
     123 30APR2011 SEG_2
     456 31AUG2010 SEG_1
     456 31JUL2012 SEG_1
     456 31OCT2017 SEG_2
     456 31OCT2021 SEG_3
     456 31DEC2022 SEG_3
     456 31JAN2024 SEG_3
;
run;

The key point here is I need to start checking time differences by taking the earliest date as reference for each col_a value. If the time difference between the first earliest and second earliest dates are longer than 36 months than I need to take the second earliest date as reference for further checks. I hope I could articulate the problem clear enough.

Any idea for how to approach/solve this problem is appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

The col_b dates in your have and want data sets do not match. See observation 3.

 

I think this does what you want though

 

data want;
   set have;
   by col_a;

   lag_date = ifn(first.col_a, ., lag(col_b));
   dif_months = intck('month', lag_date, col_b);

   if first.col_a          then col_c = 1;
   else if dif_months > 36 then col_c + 1;

   format lag_date date9.;
   drop lag_date dif_months;
run;

 

Result:

 

col_a  col_b      col_c
123    30SEP2005  1
123    31MAY2009  2
123    30APR2014  3
123    30APR2014  3
456    31AUG2010  1
456    31JUL2012  1
456    31OCT2017  2
456    31OCT2021  3
456    31DEC2022  3
456    31JAN2024  3

View solution in original post

1 REPLY 1
PeterClemmensen
Tourmaline | Level 20

The col_b dates in your have and want data sets do not match. See observation 3.

 

I think this does what you want though

 

data want;
   set have;
   by col_a;

   lag_date = ifn(first.col_a, ., lag(col_b));
   dif_months = intck('month', lag_date, col_b);

   if first.col_a          then col_c = 1;
   else if dif_months > 36 then col_c + 1;

   format lag_date date9.;
   drop lag_date dif_months;
run;

 

Result:

 

col_a  col_b      col_c
123    30SEP2005  1
123    31MAY2009  2
123    30APR2014  3
123    30APR2014  3
456    31AUG2010  1
456    31JUL2012  1
456    31OCT2017  2
456    31OCT2021  3
456    31DEC2022  3
456    31JAN2024  3

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 308 views
  • 1 like
  • 2 in conversation