Hi,
I'd like to create segments for each distinct value in a given column based on a moving time window of 36 months. What I mean by that is that for the given data below, I need to check the time distances between each date of col_b for each distinct value of col_a and where the difference shorter than 36 months I want to create a segment.
data have; input col_a col_b :date9.; format col_b date9.; datalines; 123 30SEP2005 123 31MAY2009 123 30APR2014
123 30APR2014 456 31AUG2010 456 31JUL2012 456 31OCT2017 456 31OCT2021 456 31DEC2022 456 31JAN2024 ; run;
Like so:
data want; input col_a col_b :date9. col_c $; format col_b date9.; datalines; 123 30SEP2005 SEG_1 123 31MAY2009 SEG_2 123 31AUG2010 SEG_2 123 30APR2011 SEG_2 456 31AUG2010 SEG_1 456 31JUL2012 SEG_1 456 31OCT2017 SEG_2 456 31OCT2021 SEG_3 456 31DEC2022 SEG_3 456 31JAN2024 SEG_3 ; run;
The key point here is I need to start checking time differences by taking the earliest date as reference for each col_a value. If the time difference between the first earliest and second earliest dates are longer than 36 months than I need to take the second earliest date as reference for further checks. I hope I could articulate the problem clear enough.
Any idea for how to approach/solve this problem is appreciated.
The col_b dates in your have and want data sets do not match. See observation 3.
I think this does what you want though
data want;
set have;
by col_a;
lag_date = ifn(first.col_a, ., lag(col_b));
dif_months = intck('month', lag_date, col_b);
if first.col_a then col_c = 1;
else if dif_months > 36 then col_c + 1;
format lag_date date9.;
drop lag_date dif_months;
run;
Result:
col_a col_b col_c 123 30SEP2005 1 123 31MAY2009 2 123 30APR2014 3 123 30APR2014 3 456 31AUG2010 1 456 31JUL2012 1 456 31OCT2017 2 456 31OCT2021 3 456 31DEC2022 3 456 31JAN2024 3
The col_b dates in your have and want data sets do not match. See observation 3.
I think this does what you want though
data want;
set have;
by col_a;
lag_date = ifn(first.col_a, ., lag(col_b));
dif_months = intck('month', lag_date, col_b);
if first.col_a then col_c = 1;
else if dif_months > 36 then col_c + 1;
format lag_date date9.;
drop lag_date dif_months;
run;
Result:
col_a col_b col_c 123 30SEP2005 1 123 31MAY2009 2 123 30APR2014 3 123 30APR2014 3 456 31AUG2010 1 456 31JUL2012 1 456 31OCT2017 2 456 31OCT2021 3 456 31DEC2022 3 456 31JAN2024 3
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.