Hello,
I have data with the duration of the contract. However, some data is unstructured and date contracts overlap. In these cases, I would like the term of the contract to be in line with other contracts. Below is an example of what the data looks like:
ID | START_DATE | STOP_DATE |
1 | 2019-01-01 | 2019-12-31 |
1 | 2020-01-01 | 2020-12-31 |
2 | 2020-01-01 | 2020-10-31 |
2 | 2020-01-07 | 2021-12-31 |
2 | 2020-09-01 | 2022-12-31 |
ID 1 is shown as valid. I would like to transform ID 2 into the following form:
ID | START_DATE | STOP_DATE |
1 | 2019-01-01 | 2019-12-31 |
1 | 2020-01-01 | 2020-12-31 |
2 | 2020-01-01 | 2020-01-06 |
2 | 2020-01-07 | 2020-08-31 |
2 | 2020-09-01 | 2022-12-31 |
I know the basics of SAS and I don't know how to solve this task. Can You help me?
Here is a method for looking ahead to see the value of START_DATE on the next observation.
data want ;
set have ;
by id start_date stop_date ;
set have(keep=start_date rename=(start_date=next_date) firstobs=2)
have(obs=1 drop=_all_)
;
if last.id then next_date=.;
new_stop = min(stop_date,next_date-1);
format new_stop yymmdd10.;
run;
Obs ID START_DATE STOP_DATE next_date new_stop 1 1 2019-01-01 2019-12-31 2020-01-01 2019-12-31 2 1 2020-01-01 2020-12-31 . 2020-12-31 3 2 2020-01-01 2020-10-31 2020-01-07 2020-01-06 4 2 2020-01-07 2021-12-31 2020-09-01 2020-08-31 5 2 2020-09-01 2022-12-31 . 2022-12-31
Can you please explain your business logic in more detail?
You want to modify the current stop_date if the following start_date overlaps:
data want (drop=nxt_:) ;
set have (keep=id);
by id;
merge have have (firstobs=2 keep=start_date rename=(start_date=nxt_start));
if last.id=0 then stop_date=min(stop_date,nxt_start-1);
run;
Here is a method for looking ahead to see the value of START_DATE on the next observation.
data want ;
set have ;
by id start_date stop_date ;
set have(keep=start_date rename=(start_date=next_date) firstobs=2)
have(obs=1 drop=_all_)
;
if last.id then next_date=.;
new_stop = min(stop_date,next_date-1);
format new_stop yymmdd10.;
run;
Obs ID START_DATE STOP_DATE next_date new_stop 1 1 2019-01-01 2019-12-31 2020-01-01 2019-12-31 2 1 2020-01-01 2020-12-31 . 2020-12-31 3 2 2020-01-01 2020-10-31 2020-01-07 2020-01-06 4 2 2020-01-07 2021-12-31 2020-09-01 2020-08-31 5 2 2020-09-01 2022-12-31 . 2022-12-31
Next code is tested:
data have;
infile cards dlm = '09'x truncover;
input ID Start_date yymmdd10. @+1 Stop_date yymmdd10.;
format start_date stop_date yymmdd10.;
cards;
1 2019-01-01 2019-12-31
1 2020-01-01 2020-12-31
2 2020-01-01 2020-10-31
2 2020-01-07 2021-12-31
2 2020-09-01 2022-12-31
; run;
proc sort data=have;
by ID descending start_date descending stop_date;
run;
data want;
set have;
by ID;
prev_start = lag(start_date);
if not first.ID and
stop_date > prev_start
then stop_date = prev_start -1;
drop prev_start;
run;
proc sort data=want;
by ID start_date stop_date;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.