There are ways to solve this problem, but let me first ask you, why the overlap in your output dataset. For example, why 2002 is shared by A and A-B? It seems to me that the moment B is brought in, it is not A anymore. I am aware that a year is a range of time, but I also noticed that you used it as an UNIT, so it makes me wary upon your downstream process. Here is an option: DATA have; input id trt $ start end; cards; 1 A 2000 2004 1 B 2002 2006 1 C 2004 2009 1 D 2012 2015 ; run; data _null_; if _n_=1 then do; declare hash h(); h.definekey('id','year'); h.definedata('id','year', 'treatment'); h.definedone(); length treatment $ 20; end; do until (last.id); set have end=last; by id notsorted; do year=start to end; if h.find() ne 0 then do; treatment=trt; rc=h.replace(); end; else do; treatment=catx('-',treatment, trt); rc=h.replace(); end; end; end; if last then rc=h.output(dataset:'long'); run; proc sql; create table want_clean as select treatment, min(year) as _start, max(year) as _end from long group by treatment; quit; /*will the following process be really necessary???*/ data want_messy; merge want_clean want_clean (firstobs=2 keep=treatment rename=treatment=_t); if find(_t,treatment, 't') then _end+1; if find(lag(treatment),treatment,'t') then _start=_start-1; drop _t; run;
... View more