BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
smorijawala
Calcite | Level 5

I have a dataset with start and stop dates for employees.

Employee idstart_dtstop_dt
1231/1/20144/1/2015
1233/1/201512/31/2016
1236/1/201812/2/2020
1231/2/2021 
2221/1/202012/3/2020
2224/2/201912/31/2019
3331/1/201812/31/2019
3331/2/201812/31/2019
3331/3/201812/31/2020

 

I want to find out how many concurrent time periods are in above dataset for each employee.

Employee id Conflicts
1231
2220
3333

 

Thank you.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I presume a "conflict" is when any pair of records (for a given id) overlap.  I.e., you want to compare each pair of records having the same id.

 

You can do that in a data step by maintaining 2 arrays:

  1. HISTORY array.  2-dimensions:  the first dimension has a row for each observation (I specify up to 10 rows below), the second dimension for all possible dates in your dataset (I specify 01jan2014 through 31mar2021).

  2. CONFLICTS array:  It's has a row for each obs, and a column for each obs.  If obs i and obs j overlap, then CONFLICTS{i,j}=1  (but not conflicts{j,i}.  So at the end of each id, just sum up the elements of the array:

data have;
infile cards expandtabs truncover;
input Employeeid	start_dt : mmddyy10.	stop_dt :mmddyy10.;
format start_dt stop_dt mmddyy10.;
cards;
123	1/1/2014	4/1/2015
123	3/1/2015	12/31/2016
123	6/1/2018	12/2/2020
123	1/2/2021	 
222	1/1/2020	12/3/2020
222	4/2/2019	12/31/2019
333	1/1/2018	12/31/2019
333	1/2/2018	12/31/2019
333	1/3/2018	12/31/2020
;

data want (keep=employeeid n_conflicts);
  array history{10,%sysevalf("01jan2014"d):%sysevalf("31mar2021"d)};
  array conflicts{10,10};

  do i=1 by 1 until (last.employeeid);
    set have;
    by employeeid;

    do d=start_dt to coalesce(stop_dt,start_dt);
      history{i,d}=1;
      if i>1 then do j=1 to i-1;
        if history{j,d}^=. then conflicts{i,j}=1;
      end;
    end;
  end;
  n_conflicts=sum(0,of conflicts{*});
run;

You have an obs with no stop_dt.  This program assumes stop_dt=start_dt in such a case.

 

If you expect up to 20 obs per id, then increase the array size accordingly (both HISTORY and CONFLICTS).  And if your date range is greater the jan2014-mar2021, just change the corresponding limits in the HISTORY array.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

Please supply usable data, as code.

 

Something like this should work. Adapt to your needs.

proc sql;
  select ID, sum(a.START<=b.START<=a.END)
  from (select * from HAVE a, HAVE b
        where a.ID=b.ID and a.START < b.START)
  group by ID;

 

Ksharp
Super User

How do you define CONFLICTS ?

 

 

data have;
infile cards expandtabs truncover;
input Employeeid	start_dt : mmddyy10.	stop_dt :mmddyy10.;
format start_dt stop_dt mmddyy10.;
cards;
123	1/1/2014	4/1/2015
123	3/1/2015	12/31/2016
123	6/1/2018	12/2/2020
123	1/2/2021	 
222	1/1/2020	12/3/2020
222	4/2/2019	12/31/2019
333	1/1/2018	12/31/2019
333	1/2/2018	12/31/2019
333	1/3/2018	12/31/2020
;
data temp;
 set have;
 by Employeeid;
 if first.Employeeid then n=0;
 n+1;

 if not missing(start_dt) and not missing(stop_dt) then do;
  do date=start_dt to stop_dt;
   output;
  end;
 end;
 else do; date=start_dt;output;end;
 keep  Employeeid date n;
 format date mmddyy10.;
run;

proc sql;
create table temp1 as
select  Employeeid, date , count(distinct n) as n
 from temp
  group by Employeeid, date;

create table want as
select Employeeid, max(n)-1 as conflicts
 from temp1
  group by Employeeid;
quit;
mkeintz
PROC Star

I presume a "conflict" is when any pair of records (for a given id) overlap.  I.e., you want to compare each pair of records having the same id.

 

You can do that in a data step by maintaining 2 arrays:

  1. HISTORY array.  2-dimensions:  the first dimension has a row for each observation (I specify up to 10 rows below), the second dimension for all possible dates in your dataset (I specify 01jan2014 through 31mar2021).

  2. CONFLICTS array:  It's has a row for each obs, and a column for each obs.  If obs i and obs j overlap, then CONFLICTS{i,j}=1  (but not conflicts{j,i}.  So at the end of each id, just sum up the elements of the array:

data have;
infile cards expandtabs truncover;
input Employeeid	start_dt : mmddyy10.	stop_dt :mmddyy10.;
format start_dt stop_dt mmddyy10.;
cards;
123	1/1/2014	4/1/2015
123	3/1/2015	12/31/2016
123	6/1/2018	12/2/2020
123	1/2/2021	 
222	1/1/2020	12/3/2020
222	4/2/2019	12/31/2019
333	1/1/2018	12/31/2019
333	1/2/2018	12/31/2019
333	1/3/2018	12/31/2020
;

data want (keep=employeeid n_conflicts);
  array history{10,%sysevalf("01jan2014"d):%sysevalf("31mar2021"d)};
  array conflicts{10,10};

  do i=1 by 1 until (last.employeeid);
    set have;
    by employeeid;

    do d=start_dt to coalesce(stop_dt,start_dt);
      history{i,d}=1;
      if i>1 then do j=1 to i-1;
        if history{j,d}^=. then conflicts{i,j}=1;
      end;
    end;
  end;
  n_conflicts=sum(0,of conflicts{*});
run;

You have an obs with no stop_dt.  This program assumes stop_dt=start_dt in such a case.

 

If you expect up to 20 obs per id, then increase the array size accordingly (both HISTORY and CONFLICTS).  And if your date range is greater the jan2014-mar2021, just change the corresponding limits in the HISTORY array.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 368 views
  • 1 like
  • 4 in conversation