Hi,
I'd like to create segments for each distinct value in a given column based on a moving time window of 36 months. What I mean by that is that for the given data below, I need to check the time distances between each date of col_b for each distinct value of col_a and where the difference shorter than 36 months I want to create a segment.
data have;
input col_a col_b :date9.;
format col_b date9.;
datalines;
123 30SEP2005
123 31MAY2009
123 30APR2014
123 30APR2014
456 31AUG2010
456 31JUL2012
456 31OCT2017
456 31OCT2021
456 31DEC2022
456 31JAN2024
;
run;Like so:
data want;
input col_a col_b :date9. col_c $;
format col_b date9.;
datalines;
123 30SEP2005 SEG_1
123 31MAY2009 SEG_2
123 31AUG2010 SEG_2
123 30APR2011 SEG_2
456 31AUG2010 SEG_1
456 31JUL2012 SEG_1
456 31OCT2017 SEG_2
456 31OCT2021 SEG_3
456 31DEC2022 SEG_3
456 31JAN2024 SEG_3
;
run;The key point here is I need to start checking time differences by taking the earliest date as reference for each col_a value. If the time difference between the first earliest and second earliest dates are longer than 36 months than I need to take the second earliest date as reference for further checks. I hope I could articulate the problem clear enough.
Any idea for how to approach/solve this problem is appreciated.
The col_b dates in your have and want data sets do not match. See observation 3.
I think this does what you want though
data want;
set have;
by col_a;
lag_date = ifn(first.col_a, ., lag(col_b));
dif_months = intck('month', lag_date, col_b);
if first.col_a then col_c = 1;
else if dif_months > 36 then col_c + 1;
format lag_date date9.;
drop lag_date dif_months;
run;
Result:
col_a col_b col_c 123 30SEP2005 1 123 31MAY2009 2 123 30APR2014 3 123 30APR2014 3 456 31AUG2010 1 456 31JUL2012 1 456 31OCT2017 2 456 31OCT2021 3 456 31DEC2022 3 456 31JAN2024 3
The col_b dates in your have and want data sets do not match. See observation 3.
I think this does what you want though
data want;
set have;
by col_a;
lag_date = ifn(first.col_a, ., lag(col_b));
dif_months = intck('month', lag_date, col_b);
if first.col_a then col_c = 1;
else if dif_months > 36 then col_c + 1;
format lag_date date9.;
drop lag_date dif_months;
run;
Result:
col_a col_b col_c 123 30SEP2005 1 123 31MAY2009 2 123 30APR2014 3 123 30APR2014 3 456 31AUG2010 1 456 31JUL2012 1 456 31OCT2017 2 456 31OCT2021 3 456 31DEC2022 3 456 31JAN2024 3
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.