BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
smorijawala
Calcite | Level 5

I have a dataset with start and stop dates for employees.

Employee idstart_dtstop_dt
1231/1/20144/1/2015
1233/1/201512/31/2016
1236/1/201812/2/2020
1231/2/2021 
2221/1/202012/3/2020
2224/2/201912/31/2019
3331/1/201812/31/2019
3331/2/201812/31/2019
3331/3/201812/31/2020

 

I want to find out how many concurrent time periods are in above dataset for each employee.

Employee id Conflicts
1231
2220
3333

 

Thank you.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I presume a "conflict" is when any pair of records (for a given id) overlap.  I.e., you want to compare each pair of records having the same id.

 

You can do that in a data step by maintaining 2 arrays:

  1. HISTORY array.  2-dimensions:  the first dimension has a row for each observation (I specify up to 10 rows below), the second dimension for all possible dates in your dataset (I specify 01jan2014 through 31mar2021).

  2. CONFLICTS array:  It's has a row for each obs, and a column for each obs.  If obs i and obs j overlap, then CONFLICTS{i,j}=1  (but not conflicts{j,i}.  So at the end of each id, just sum up the elements of the array:

data have;
infile cards expandtabs truncover;
input Employeeid	start_dt : mmddyy10.	stop_dt :mmddyy10.;
format start_dt stop_dt mmddyy10.;
cards;
123	1/1/2014	4/1/2015
123	3/1/2015	12/31/2016
123	6/1/2018	12/2/2020
123	1/2/2021	 
222	1/1/2020	12/3/2020
222	4/2/2019	12/31/2019
333	1/1/2018	12/31/2019
333	1/2/2018	12/31/2019
333	1/3/2018	12/31/2020
;

data want (keep=employeeid n_conflicts);
  array history{10,%sysevalf("01jan2014"d):%sysevalf("31mar2021"d)};
  array conflicts{10,10};

  do i=1 by 1 until (last.employeeid);
    set have;
    by employeeid;

    do d=start_dt to coalesce(stop_dt,start_dt);
      history{i,d}=1;
      if i>1 then do j=1 to i-1;
        if history{j,d}^=. then conflicts{i,j}=1;
      end;
    end;
  end;
  n_conflicts=sum(0,of conflicts{*});
run;

You have an obs with no stop_dt.  This program assumes stop_dt=start_dt in such a case.

 

If you expect up to 20 obs per id, then increase the array size accordingly (both HISTORY and CONFLICTS).  And if your date range is greater the jan2014-mar2021, just change the corresponding limits in the HISTORY array.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

Please supply usable data, as code.

 

Something like this should work. Adapt to your needs.

proc sql;
  select ID, sum(a.START<=b.START<=a.END)
  from (select * from HAVE a, HAVE b
        where a.ID=b.ID and a.START < b.START)
  group by ID;

 

Ksharp
Super User

How do you define CONFLICTS ?

 

 

data have;
infile cards expandtabs truncover;
input Employeeid	start_dt : mmddyy10.	stop_dt :mmddyy10.;
format start_dt stop_dt mmddyy10.;
cards;
123	1/1/2014	4/1/2015
123	3/1/2015	12/31/2016
123	6/1/2018	12/2/2020
123	1/2/2021	 
222	1/1/2020	12/3/2020
222	4/2/2019	12/31/2019
333	1/1/2018	12/31/2019
333	1/2/2018	12/31/2019
333	1/3/2018	12/31/2020
;
data temp;
 set have;
 by Employeeid;
 if first.Employeeid then n=0;
 n+1;

 if not missing(start_dt) and not missing(stop_dt) then do;
  do date=start_dt to stop_dt;
   output;
  end;
 end;
 else do; date=start_dt;output;end;
 keep  Employeeid date n;
 format date mmddyy10.;
run;

proc sql;
create table temp1 as
select  Employeeid, date , count(distinct n) as n
 from temp
  group by Employeeid, date;

create table want as
select Employeeid, max(n)-1 as conflicts
 from temp1
  group by Employeeid;
quit;
mkeintz
PROC Star

I presume a "conflict" is when any pair of records (for a given id) overlap.  I.e., you want to compare each pair of records having the same id.

 

You can do that in a data step by maintaining 2 arrays:

  1. HISTORY array.  2-dimensions:  the first dimension has a row for each observation (I specify up to 10 rows below), the second dimension for all possible dates in your dataset (I specify 01jan2014 through 31mar2021).

  2. CONFLICTS array:  It's has a row for each obs, and a column for each obs.  If obs i and obs j overlap, then CONFLICTS{i,j}=1  (but not conflicts{j,i}.  So at the end of each id, just sum up the elements of the array:

data have;
infile cards expandtabs truncover;
input Employeeid	start_dt : mmddyy10.	stop_dt :mmddyy10.;
format start_dt stop_dt mmddyy10.;
cards;
123	1/1/2014	4/1/2015
123	3/1/2015	12/31/2016
123	6/1/2018	12/2/2020
123	1/2/2021	 
222	1/1/2020	12/3/2020
222	4/2/2019	12/31/2019
333	1/1/2018	12/31/2019
333	1/2/2018	12/31/2019
333	1/3/2018	12/31/2020
;

data want (keep=employeeid n_conflicts);
  array history{10,%sysevalf("01jan2014"d):%sysevalf("31mar2021"d)};
  array conflicts{10,10};

  do i=1 by 1 until (last.employeeid);
    set have;
    by employeeid;

    do d=start_dt to coalesce(stop_dt,start_dt);
      history{i,d}=1;
      if i>1 then do j=1 to i-1;
        if history{j,d}^=. then conflicts{i,j}=1;
      end;
    end;
  end;
  n_conflicts=sum(0,of conflicts{*});
run;

You have an obs with no stop_dt.  This program assumes stop_dt=start_dt in such a case.

 

If you expect up to 20 obs per id, then increase the array size accordingly (both HISTORY and CONFLICTS).  And if your date range is greater the jan2014-mar2021, just change the corresponding limits in the HISTORY array.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 930 views
  • 1 like
  • 4 in conversation