Hello everyone,
Has anyone done something like this before?
I have the following example dataset:
I want to use this dataset to create a new one, where Obs 6 is dropped, due to occurring within the range of dates in Obs 5. In other words, since the range of dates Nov2 - Nov4, 2019 is subsumed into a longer time range above, Nov1-Nov10, 2019 - I don't need Obs 6 at all and want to somehow get rid of it.
Below is the code that creates this example and also creates the dataset that I would like to create. The point is somehow flagging Obs 6, so that I can then drop it, so the eventual dataset doesn't have to be 100% identical to what wrote in the second part of the code. As long as I can somehow flag that record.
/*Example of initial dataset.*/ data example; input employee_id $ startdate :DATE8. enddate :DATE8.; format startdate enddate DATE8.; datalines; 12345 21JUN20 24JUN20 12345 28JUN20 18JUL20 12345 29JUL20 14AUG20 12345 09NOV20 20NOV20 67890 01NOV19 10NOV19 67890 02NOV19 04NOV19 ; run; proc print data=example; run; /*Example of eventual dataset.*/ data want; input employee_id $ startdate :DATE8. enddate :DATE8. final_startdate :DATE8. final_enddate :DATE8. drop $; format startdate enddate final_startdate final_enddate DATE8.; datalines; 12345 21JUN20 24JUN20 21JUN20 24JUN20 no 12345 28JUN20 18JUL20 28JUN20 18JUL20 no 12345 29JUL20 14AUG20 29JUL20 14AUG20 no 12345 09NOV20 20NOV20 09NOV20 20NOV20 no 67890 01NOV19 10NOV19 01NOV19 10NOV19 no 67890 02NOV19 04NOV19 . . yes ; run; proc print data=want; run;
Thank you!
data example;
input employee_id $ startdate :DATE8. enddate :DATE8.;
format startdate enddate DATE8.;
datalines;
12345 21JUN20 24JUN20
12345 28JUN20 18JUL20
12345 29JUL20 14AUG20
12345 09NOV20 20NOV20
67890 01NOV19 10NOV19
67890 02NOV19 04NOV19
;
run;
data temp;
set example;
do date=startdate to enddate ;
output;
end;
keep employee_id date;
format date date9.;
run;
proc sort data=temp nodupkey;
by employee_id date;
run;
data temp;
set temp;
by employee_id;
if first.employee_id or dif(date) ne 1 then group+1;
run;
proc summary data=temp;
by employee_id group;
var date;
output out=want min=startdate max=enddate;
run;
data example;
input employee_id $ startdate :DATE8. enddate :DATE8.;
format startdate enddate DATE8.;
datalines;
12345 21JUN20 24JUN20
12345 28JUN20 18JUL20
12345 29JUL20 14AUG20
12345 09NOV20 20NOV20
67890 01NOV19 10NOV19
67890 02NOV19 04NOV19
;
run;
data temp;
set example;
do date=startdate to enddate ;
output;
end;
keep employee_id date;
format date date9.;
run;
proc sort data=temp nodupkey;
by employee_id date;
run;
data temp;
set temp;
by employee_id;
if first.employee_id or dif(date) ne 1 then group+1;
run;
proc summary data=temp;
by employee_id group;
var date;
output out=want min=startdate max=enddate;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.