I have a dataset with start and stop dates for employees.
Employee id | start_dt | stop_dt |
123 | 1/1/2014 | 4/1/2015 |
123 | 3/1/2015 | 12/31/2016 |
123 | 6/1/2018 | 12/2/2020 |
123 | 1/2/2021 | |
222 | 1/1/2020 | 12/3/2020 |
222 | 4/2/2019 | 12/31/2019 |
333 | 1/1/2018 | 12/31/2019 |
333 | 1/2/2018 | 12/31/2019 |
333 | 1/3/2018 | 12/31/2020 |
I want to find out how many concurrent time periods are in above dataset for each employee.
Employee id | Conflicts |
123 | 1 |
222 | 0 |
333 | 3 |
Thank you.
I presume a "conflict" is when any pair of records (for a given id) overlap. I.e., you want to compare each pair of records having the same id.
You can do that in a data step by maintaining 2 arrays:
data have;
infile cards expandtabs truncover;
input Employeeid start_dt : mmddyy10. stop_dt :mmddyy10.;
format start_dt stop_dt mmddyy10.;
cards;
123 1/1/2014 4/1/2015
123 3/1/2015 12/31/2016
123 6/1/2018 12/2/2020
123 1/2/2021
222 1/1/2020 12/3/2020
222 4/2/2019 12/31/2019
333 1/1/2018 12/31/2019
333 1/2/2018 12/31/2019
333 1/3/2018 12/31/2020
;
data want (keep=employeeid n_conflicts);
array history{10,%sysevalf("01jan2014"d):%sysevalf("31mar2021"d)};
array conflicts{10,10};
do i=1 by 1 until (last.employeeid);
set have;
by employeeid;
do d=start_dt to coalesce(stop_dt,start_dt);
history{i,d}=1;
if i>1 then do j=1 to i-1;
if history{j,d}^=. then conflicts{i,j}=1;
end;
end;
end;
n_conflicts=sum(0,of conflicts{*});
run;
You have an obs with no stop_dt. This program assumes stop_dt=start_dt in such a case.
If you expect up to 20 obs per id, then increase the array size accordingly (both HISTORY and CONFLICTS). And if your date range is greater the jan2014-mar2021, just change the corresponding limits in the HISTORY array.
Please supply usable data, as code.
Something like this should work. Adapt to your needs.
proc sql;
select ID, sum(a.START<=b.START<=a.END)
from (select * from HAVE a, HAVE b
where a.ID=b.ID and a.START < b.START)
group by ID;
This works! Thank you!
How do you define CONFLICTS ?
data have; infile cards expandtabs truncover; input Employeeid start_dt : mmddyy10. stop_dt :mmddyy10.; format start_dt stop_dt mmddyy10.; cards; 123 1/1/2014 4/1/2015 123 3/1/2015 12/31/2016 123 6/1/2018 12/2/2020 123 1/2/2021 222 1/1/2020 12/3/2020 222 4/2/2019 12/31/2019 333 1/1/2018 12/31/2019 333 1/2/2018 12/31/2019 333 1/3/2018 12/31/2020 ; data temp; set have; by Employeeid; if first.Employeeid then n=0; n+1; if not missing(start_dt) and not missing(stop_dt) then do; do date=start_dt to stop_dt; output; end; end; else do; date=start_dt;output;end; keep Employeeid date n; format date mmddyy10.; run; proc sql; create table temp1 as select Employeeid, date , count(distinct n) as n from temp group by Employeeid, date; create table want as select Employeeid, max(n)-1 as conflicts from temp1 group by Employeeid; quit;
I presume a "conflict" is when any pair of records (for a given id) overlap. I.e., you want to compare each pair of records having the same id.
You can do that in a data step by maintaining 2 arrays:
data have;
infile cards expandtabs truncover;
input Employeeid start_dt : mmddyy10. stop_dt :mmddyy10.;
format start_dt stop_dt mmddyy10.;
cards;
123 1/1/2014 4/1/2015
123 3/1/2015 12/31/2016
123 6/1/2018 12/2/2020
123 1/2/2021
222 1/1/2020 12/3/2020
222 4/2/2019 12/31/2019
333 1/1/2018 12/31/2019
333 1/2/2018 12/31/2019
333 1/3/2018 12/31/2020
;
data want (keep=employeeid n_conflicts);
array history{10,%sysevalf("01jan2014"d):%sysevalf("31mar2021"d)};
array conflicts{10,10};
do i=1 by 1 until (last.employeeid);
set have;
by employeeid;
do d=start_dt to coalesce(stop_dt,start_dt);
history{i,d}=1;
if i>1 then do j=1 to i-1;
if history{j,d}^=. then conflicts{i,j}=1;
end;
end;
end;
n_conflicts=sum(0,of conflicts{*});
run;
You have an obs with no stop_dt. This program assumes stop_dt=start_dt in such a case.
If you expect up to 20 obs per id, then increase the array size accordingly (both HISTORY and CONFLICTS). And if your date range is greater the jan2014-mar2021, just change the corresponding limits in the HISTORY array.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.