Hi Everyone,
I am trying to calculate the number of absence days (business days only) for employees. I have multiple records for each employee and the records can be duplicate or have overlapping dates. I am using Alice. M Cheng method for calculating intervals and accounting for overlaps. The missing part is how to incorporate business days in the code. I tried to incorporate the intck function but it didn't work .
The data looks as follows
ID startdate enddate
1 1-1-2015 1-17-2015
1 1-1-2015 1-25-2015
1 2-7-2015 2-25-2015
2 3-4-2016 5-8-2016
2 3-4-2016 5-12-2016
2 3-4-2016 5-12- 2016
The code used to calculate intervals
proc sort data=Dates1;
by ID startdate ;
run;
proc transpose data=Dates1 out=STARTDT (drop=_name_) prefix=START;
by ID;
var startdate;
run;
proc transpose data=Dates1 out=STOPDT (drop=_name_) prefix=STOP;
by ID;
var enddate;
run;
data AE_DURH;
merge STARTDT STOPDT;
by ENROLID;
run;
data METHOD1;
retain TOTDUR;
set AE_DURH; *--- Use Horizontal Data Representation. ---*;
array START{*} START:;
array STOP{*} STOP:;
TOTDUR=STOP1-START1+1
;
do i=2 to dim(START);
do j=1 to i-1;
*--- Make overlapping but not embedded intervals disjoint. ---*;
if . < START(i) <= STOP(j) and STOP(i) > STOP(j) > . then
do;
START(i)=STOP(j)+1;
end;
*--- Embedded in previous interval. Will not contribute to duration. ---*;
else
do;
if . < START(i) <= STOP(j) and . < STOP(i) <= STOP(j) then
do;
START(i)=.;
STOP(i)=.;
end;
end;
end;
if START(i) ne . and STOP(i) ne . then DUR=STOP(i)-START(i)+1;
else DUR=0;
TOTDUR+DUR;
end;
label TOTDUR='DURATION |(DAYS)';
run;
You can also do this with arrays
data have;
input ID (startdate enddate) (:mmddyy.);
format startdate enddate yymmdd10.;
datalines;
1 1-1-2015 1-17-2015
1 1-1-2015 1-25-2015
1 2-7-2015 2-25-2015
2 3-4-2016 5-8-2016
2 3-4-2016 5-12-2016
2 3-4-2016 5-12-2016
;
proc sql;
select min(startdate), max(enddate)
into :from, :to
from have;
quit;
data want;
array d {&from : &to} _temporary_;
call missing(of d{*});
do until(last.id);
set have; by id;
do i = startdate to enddate;
if
weekday(i) not in (1, 7) and /* weekends */
not (month(i)=12 and day(i)=25) and /* Christmas */
not (month(i)=1 and day(i)=1) /* New Year */
/* ... etc. */
then d{i} = 1;
end;
end;
duration = sum(of d{*});
keep ID duration;
run;
just complete the list of holidays for your locale.
Edit: The HOLIDAY function can help you define those if you are in USA or Canada.
Didn't work is awful vague.
Are there errors in the log?: Post the code and log in a code box opened with the {i} to maintain formatting of error messages.
No output? Post any log in a code box.
Unexpected output? Provide input data in the form of data step code pasted into a code box, the actual results and the expected results. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Hi ballardw,
Thank you for your response. I have zero values for duration DUR and total duration TOTDUR. here is the log, the output and the function that I have used. The code seems to work in disjointing the overlapping periods but the formula to calculate the dates is not working.
P.S. The link in your comment is not working.
data METHOD1;
retain TOTDUR;
set AE_DURH; *--- Use Horizontal Data Representation. ---*;
array START{*} START:;
array STOP{*} STOP:;
TOTDUR=intck('dtday', startdate, enddate)+1;
do i=2 to dim(START);
do j=1 to i-1;
*--- Make overlapping but not embedded intervals disjoint. ---*;
if . < START(i) <= STOP(j) and STOP(i) > STOP(j) > . then
do;
START(i)=STOP(j)+1;
end;
*--- Embedded in previous interval. Will not contribute to duration. ---*;
else
do;
if . < START(i) <= STOP(j) and . < STOP(i) <= STOP(j) then
do;
START(i)=.;
STOP(i)=.;
end;
end;
end;
if START(i) ne . and STOP(i) ne . then DUR=STOP(i)-START(i)+1;
else DUR=0;
TOTDUR+DUR;
end;
label TOTDUR='DURATION |(DAYS)';
run;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
773:8 773:42 775:10 780:8 780:36 787:4 787:22 787:44 787:52
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
775:1 782:1 783:1
NOTE: Variable startdate is uninitialized.
NOTE: Variable enddate is uninitialized.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1039 at 769:8 1039 at 769:36
NOTE: There were 1039 observations read from the data set WORK.AE_DURH.
NOTE: The data set WORK.METHOD1 has 1039 observations and 16 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
output
ID | _LABEL_ | START1 | START2 | START3 | START4 | STOP1 | STOP2 | STOP3 | STOP4 | startdate | enddate | i | j | DUR | TOTDUR |
1 | enddate | 42103 | . | 42150 | . | . | . | 5 | 4 | 0 | 0 | ||||
2 | enddate | 41625 | 41702 | . | . | 5 | 4 | 0 | 0 | ||||||
3 | enddate | 41670 | 41862 | 41670 | 41981 | . | . | 5 | 4 | 0 | 120 | ||||
4 | enddate | 41701 | 41905 | 41943 | 42044 | 41701 | 41906 | 41946 | 42100 | . | . | 5 | 4 | 57 | 63 |
5 | enddate | 41807 | 41855 | . | . | 5 | 4 | 0 | 0 | ||||||
6 | enddate | 41563 | 41582 | . | . | 5 | 4 | 0 | 0 | ||||||
7 | enddate | 42086 | 42139 | . | . | 5 | 4 | 0 | 0 |
The note here is pretty indicative:
NOTE: Variable startdate is uninitialized. NOTE: Variable enddate is uninitialized.
Your data set AE_DURH likely does not have the variables you think it does. Your references to those two variables in the created them in the intck function reference probably created them. So TOTDUR starts off missing.
Your code
if START(i) ne . and STOP(i) ne . then DUR=STOP(i)-START(i)+1;
else DUR=0;
means that at the end of the loop dur is going to have the comparison for the last pair of stop start, which is 0 because of the ELSE for most of your records.
A possible fix for the "dur" issue is to remove the Else. Maybe. I'm not sure what you are actually attempting to accomplish.
I am sorry for the typos. I did fixed that in the code and still not getting the result that I am supposed to get. I need to calculate the total number of business absence days per employee. In this example employee number two has 3 records, the first record has the same start date but different end date than the rest of the records. The second record is embedded in the first one, and the third records is just a duplicate for the second record. I cannot calculate the total number by summing the absence days for all records because they are not unique. I wanna get something like this for example
ID Total duration
1 24 days
2 49 days
I was able to get the total number of days by
TOTDUR=STOP1-START1+1
But I am interested in business days only, not all days between start and end date. You can try this code and put
TOTDUR=STOP1-START1+1 instead of
TOTDUR=intck('WEEKDAY', START1+1, STOP1)
data have;
input ID (startdate enddate) (:mmddyy.);
format startdate enddate yymmdd10.;
datalines;
1 1-1-2015 1-17-2015
1 1-1-2015 1-25-2015
1 2-7-2015 2-25-2015
2 3-4-2016 5-8-2016
2 3-4-2016 5-12-2016
2 3-4-2016 5-12-2016
;
proc sort data= have ;
by ID startdate;
run;
proc transpose data=have out=STARTDT (drop=_name_) prefix=START;
by ID;
var startdate;
run;
proc transpose data=have out=STOPDT (drop=_name_) prefix=STOP;
by ID;
var enddate;
run;
data AE_DURH;
merge STARTDT STOPDT;
by ID;
run;
data METHOD1;
retain TOTDUR;
set AE_DURH; *--- Use Horizontal Data Representation. ---*;
array START{*} START:;
array STOP{*} STOP:;
TOTDUR=intck('WEEKDAY', START1+1, STOP1);
do i=2 to dim(START);
do j=1 to i-1;
*--- Make overlapping but not embedded intervals disjoint. ---*;
if . < START(i) <= STOP(j) and STOP(i) > STOP(j) > . then
do;
START(i)=STOP(j)+1;
end;
*--- Embedded in previous interval. Will not contribute to duration. ---*;
else
do;
if . < START(i) <= STOP(j) and . < STOP(i) <= STOP(j) then
do;
START(i)=.;
STOP(i)=.;
end;
end;
end;
if START(i) ne . and STOP(i) ne . then DUR=STOP(i)-START(i)+1;
else DUR=0;
TOTDUR+DUR;
end;
label TOTDUR='DURATION |(DAYS)';
run;
proc print data=METHOD1;
run;
You can also do this with arrays
data have;
input ID (startdate enddate) (:mmddyy.);
format startdate enddate yymmdd10.;
datalines;
1 1-1-2015 1-17-2015
1 1-1-2015 1-25-2015
1 2-7-2015 2-25-2015
2 3-4-2016 5-8-2016
2 3-4-2016 5-12-2016
2 3-4-2016 5-12-2016
;
proc sql;
select min(startdate), max(enddate)
into :from, :to
from have;
quit;
data want;
array d {&from : &to} _temporary_;
call missing(of d{*});
do until(last.id);
set have; by id;
do i = startdate to enddate;
if
weekday(i) not in (1, 7) and /* weekends */
not (month(i)=12 and day(i)=25) and /* Christmas */
not (month(i)=1 and day(i)=1) /* New Year */
/* ... etc. */
then d{i} = 1;
end;
end;
duration = sum(of d{*});
keep ID duration;
run;
just complete the list of holidays for your locale.
Edit: The HOLIDAY function can help you define those if you are in USA or Canada.
Hi PGStats,
Thank you so much. It worked and I got the numbers.
Can I incorporate a formula in the array?. I wanna put index date to calculate absence days up until the index date. The index date is not fixed for all employees, instead , it is 3 months from the day of first absence. I added an extra record for employee number 2. This absence is after the index date so it shouldn't be counted.
if end.date < index.date then from startdate to enddate
if startdate > index.date then from start.date to start.date
else if from start.date to index.date.
data have;
input ID (startdate enddate indexdate ) (:mmddyy.);
format startdate enddate indexdate yymmdd10.;
datalines;
1 1-1-2015 1-17-2015 4-1-2015
1 1-1-2015 1-25-2015 4-1-2015
1 2-7-2015 2-25-2015 4-1-2015
2 3-4-2016 5-8-2016 6-4-2016
2 3-4-2016 5-12-2016 6-4-2016
2 3-4-2016 5-12-2016 6-4-2016
2 8-5-2016 8-17-2016 6-4-2016
;
I guess you need:
data have;
input ID (startdate enddate indexdate ) (:mmddyy.);
format startdate enddate indexdate yymmdd10.;
datalines;
1 1-1-2015 1-17-2015 4-1-2015
1 1-1-2015 1-25-2015 4-1-2015
1 2-7-2015 2-25-2015 4-1-2015
2 3-4-2016 5-8-2016 6-4-2016
2 3-4-2016 5-12-2016 6-4-2016
2 3-4-2016 5-12-2016 6-4-2016
2 8-5-2016 8-17-2016 6-4-2016
;
proc sql noprint;
select min(startdate), max(enddate)
into :from, :to
from have;
quit;
data want;
array d {&from : &to} _temporary_;
call missing(of d{*});
do until(last.id);
set have; by id;
if startDate < indexDate then do;
do i = startdate to min(indexDate-1, enddate);
if
weekday(i) not in (1, 7) and /* weekends */
not (month(i)=12 and day(i)=25) and /* Christmas */
not (month(i)=1 and day(i)=1) /* New Year */
/* ... etc. */
then d{i} = 1;
end;
end;
end;
duration = sum(of d{*});
keep ID duration;
run;
Check if the -1 is required in min(indexDate-1, endDate).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.