I have a dataset with start and stop dates for employees.
Employee id | start_dt | stop_dt |
123 | 1/1/2014 | 4/1/2015 |
123 | 3/1/2015 | 12/31/2016 |
123 | 6/1/2018 | 12/2/2020 |
123 | 1/2/2021 | |
222 | 1/1/2020 | 12/3/2020 |
222 | 4/2/2019 | 12/31/2019 |
333 | 1/1/2018 | 12/31/2019 |
333 | 1/2/2018 | 12/31/2019 |
333 | 1/3/2018 | 12/31/2020 |
I want to find out how many concurrent time periods are in above dataset for each employee.
Employee id | Conflicts |
123 | 1 |
222 | 0 |
333 | 3 |
Thank you.
I presume a "conflict" is when any pair of records (for a given id) overlap. I.e., you want to compare each pair of records having the same id.
You can do that in a data step by maintaining 2 arrays:
data have;
infile cards expandtabs truncover;
input Employeeid start_dt : mmddyy10. stop_dt :mmddyy10.;
format start_dt stop_dt mmddyy10.;
cards;
123 1/1/2014 4/1/2015
123 3/1/2015 12/31/2016
123 6/1/2018 12/2/2020
123 1/2/2021
222 1/1/2020 12/3/2020
222 4/2/2019 12/31/2019
333 1/1/2018 12/31/2019
333 1/2/2018 12/31/2019
333 1/3/2018 12/31/2020
;
data want (keep=employeeid n_conflicts);
array history{10,%sysevalf("01jan2014"d):%sysevalf("31mar2021"d)};
array conflicts{10,10};
do i=1 by 1 until (last.employeeid);
set have;
by employeeid;
do d=start_dt to coalesce(stop_dt,start_dt);
history{i,d}=1;
if i>1 then do j=1 to i-1;
if history{j,d}^=. then conflicts{i,j}=1;
end;
end;
end;
n_conflicts=sum(0,of conflicts{*});
run;
You have an obs with no stop_dt. This program assumes stop_dt=start_dt in such a case.
If you expect up to 20 obs per id, then increase the array size accordingly (both HISTORY and CONFLICTS). And if your date range is greater the jan2014-mar2021, just change the corresponding limits in the HISTORY array.
Please supply usable data, as code.
Something like this should work. Adapt to your needs.
proc sql;
select ID, sum(a.START<=b.START<=a.END)
from (select * from HAVE a, HAVE b
where a.ID=b.ID and a.START < b.START)
group by ID;
This works! Thank you!
How do you define CONFLICTS ?
data have; infile cards expandtabs truncover; input Employeeid start_dt : mmddyy10. stop_dt :mmddyy10.; format start_dt stop_dt mmddyy10.; cards; 123 1/1/2014 4/1/2015 123 3/1/2015 12/31/2016 123 6/1/2018 12/2/2020 123 1/2/2021 222 1/1/2020 12/3/2020 222 4/2/2019 12/31/2019 333 1/1/2018 12/31/2019 333 1/2/2018 12/31/2019 333 1/3/2018 12/31/2020 ; data temp; set have; by Employeeid; if first.Employeeid then n=0; n+1; if not missing(start_dt) and not missing(stop_dt) then do; do date=start_dt to stop_dt; output; end; end; else do; date=start_dt;output;end; keep Employeeid date n; format date mmddyy10.; run; proc sql; create table temp1 as select Employeeid, date , count(distinct n) as n from temp group by Employeeid, date; create table want as select Employeeid, max(n)-1 as conflicts from temp1 group by Employeeid; quit;
I presume a "conflict" is when any pair of records (for a given id) overlap. I.e., you want to compare each pair of records having the same id.
You can do that in a data step by maintaining 2 arrays:
data have;
infile cards expandtabs truncover;
input Employeeid start_dt : mmddyy10. stop_dt :mmddyy10.;
format start_dt stop_dt mmddyy10.;
cards;
123 1/1/2014 4/1/2015
123 3/1/2015 12/31/2016
123 6/1/2018 12/2/2020
123 1/2/2021
222 1/1/2020 12/3/2020
222 4/2/2019 12/31/2019
333 1/1/2018 12/31/2019
333 1/2/2018 12/31/2019
333 1/3/2018 12/31/2020
;
data want (keep=employeeid n_conflicts);
array history{10,%sysevalf("01jan2014"d):%sysevalf("31mar2021"d)};
array conflicts{10,10};
do i=1 by 1 until (last.employeeid);
set have;
by employeeid;
do d=start_dt to coalesce(stop_dt,start_dt);
history{i,d}=1;
if i>1 then do j=1 to i-1;
if history{j,d}^=. then conflicts{i,j}=1;
end;
end;
end;
n_conflicts=sum(0,of conflicts{*});
run;
You have an obs with no stop_dt. This program assumes stop_dt=start_dt in such a case.
If you expect up to 20 obs per id, then increase the array size accordingly (both HISTORY and CONFLICTS). And if your date range is greater the jan2014-mar2021, just change the corresponding limits in the HISTORY array.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.