Hi,
I'd like to create segments for each distinct value in a given column based on a moving time window of 36 months. What I mean by that is that for the given data below, I need to check the time distances between each date of col_b for each distinct value of col_a and where the difference shorter than 36 months I want to create a segment.
data have; input col_a col_b :date9.; format col_b date9.; datalines; 123 30SEP2005 123 31MAY2009 123 30APR2014
123 30APR2014 456 31AUG2010 456 31JUL2012 456 31OCT2017 456 31OCT2021 456 31DEC2022 456 31JAN2024 ; run;
Like so:
data want; input col_a col_b :date9. col_c $; format col_b date9.; datalines; 123 30SEP2005 SEG_1 123 31MAY2009 SEG_2 123 31AUG2010 SEG_2 123 30APR2011 SEG_2 456 31AUG2010 SEG_1 456 31JUL2012 SEG_1 456 31OCT2017 SEG_2 456 31OCT2021 SEG_3 456 31DEC2022 SEG_3 456 31JAN2024 SEG_3 ; run;
The key point here is I need to start checking time differences by taking the earliest date as reference for each col_a value. If the time difference between the first earliest and second earliest dates are longer than 36 months than I need to take the second earliest date as reference for further checks. I hope I could articulate the problem clear enough.
Any idea for how to approach/solve this problem is appreciated.
The col_b dates in your have and want data sets do not match. See observation 3.
I think this does what you want though
data want;
set have;
by col_a;
lag_date = ifn(first.col_a, ., lag(col_b));
dif_months = intck('month', lag_date, col_b);
if first.col_a then col_c = 1;
else if dif_months > 36 then col_c + 1;
format lag_date date9.;
drop lag_date dif_months;
run;
Result:
col_a col_b col_c 123 30SEP2005 1 123 31MAY2009 2 123 30APR2014 3 123 30APR2014 3 456 31AUG2010 1 456 31JUL2012 1 456 31OCT2017 2 456 31OCT2021 3 456 31DEC2022 3 456 31JAN2024 3
The col_b dates in your have and want data sets do not match. See observation 3.
I think this does what you want though
data want;
set have;
by col_a;
lag_date = ifn(first.col_a, ., lag(col_b));
dif_months = intck('month', lag_date, col_b);
if first.col_a then col_c = 1;
else if dif_months > 36 then col_c + 1;
format lag_date date9.;
drop lag_date dif_months;
run;
Result:
col_a col_b col_c 123 30SEP2005 1 123 31MAY2009 2 123 30APR2014 3 123 30APR2014 3 456 31AUG2010 1 456 31JUL2012 1 456 31OCT2017 2 456 31OCT2021 3 456 31DEC2022 3 456 31JAN2024 3
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.