I select a month's data at a time from two production tables. There can be 40k to 50k rows per month per table. Below is a sample of the data. I have to calculate the time it takes in days for a person to respond (an activity in table2) to an assignment (table1) taking into account holidays (skip holidays). The did column is only there to help someone see how the data should match up in the two tables. It is not part of the data extracted from the two production tables. The rows should be matched on pgm_id and user and in table2 we want the first row with a time greater than the time in table1. For instance, for did = 12 in table1 there are two possible rows in table2 (did=12,13) but we use the row with did = 12 because that activity happened first. For did = 1 in table 1 there is no corresponding row in table 2 so by default the calculation is 30 days. One of the issues is that for user = Dan there are several rows in both tables so how can we process them? I was wondering if it could be done in a loop on table1 with another loop inside of that one on table2. In this case, to process did = 3 in table1, if we had ordered the rows in table2 then we could start looping through the rows in table2 and the first row we get to with user=Dan and pgm_id = 63832680 would satisfy the requirements because its tiime is later than the time in table1. Then we exit that loop and go to the next row in the outside loop with did=4. We go back to the inside loop and when we get to did=4 it again satisfies the conditions and we do the calculation and exit the inside loop and so on.
I've never done a loop within a loop so the first question is will it work for this problem? If so, how and where do I store the results of the calculation. Also, how do I deal with holidays? I've read somewhere that there is a SAS function for holidays.
data table1;
infile datalines;
input did pgm_id user $ dttime DATETIME24.3;
format dttime DATETIME24.3;
return;
datalines;
1 60124541 BOB 14AUG2013:22:24:41.894
2 51452877 JAN 01AUG2013:19:20:58.785
3 63832680 DAN 08AUG2013:19:13:00.356
4 63832680 DAN 13AUG2013:19:29:39.578
5 63832680 DAN 16AUG2013:18:40:52.547
6 63832680 DAN 22AUG2013:13:57:12.506
7 63832680 DAN 23AUG2013:15:30:00.452
8 63832680 DAN 23AUG2013:19:42:55.160
9 63832680 DAN 26AUG2013:20:29:40.386
11 59796423 JUNE 05AUG2013:19:16:37.764
12 59796423 JUNE 14AUG2013:15:37:05.595
14 59829038 RON 27AUG2013:18:34:11.931
;
run;
data table2;
infile datalines;
input did pgm_id user $ dttime DATETIME24.3;
format dttime DATETIME24.3;
return;
datalines;
2 51452877 JAN 01AUG2013:23:59:58.285
3 63832680 DAN 08AUG2013:19:51:53.322
4 63832680 DAN 13AUG2013:20:01:24.427
5 63832680 DAN 16AUG2013:20:52:52.136
6 63832680 DAN 22AUG2013:14:50:58.576
7 63832680 DAN 23AUG2013:19:33:19.111
8 63832680 DAN 23AUG2013:20:27:52.563
9 63832680 DAN 28AUG2013:20:28:50.595
10 63832680 SUE 11AUG2013:22:24:41.894
11 59796423 JUNE 05AUG2013:19:24:22.605
12 59796423 JUNE 14AUG2013:19:48:40.025
13 59796423 JUNE 14AUG2013:19:58:59.937
14 59829038 RON 27AUG2013:21:58:11.160
;
run;
Thanks for the help.
DanD
Try this to get the start and end date times. The code below assumes you want an equijoin (only rows that have matching pgm_id and user in both tables). If you want all records in table1, even if there is no matching observation in table2, we will need to use a left join instead.
You will also need to add logic to deal with calculating the time between the two dates . . . there are plenty of examples in these forums and elsewhere how to calculate intervals excluding holidays.
proc sql;
create table want as
select t1.did
, t1.pgm_id
, t1.user
, t1.dttime as start_dtm
, t2.dttime as end_dtm
from table1 t1
, table2 t2
where t1.pgm_id = t2.pgm_id
and t1.user = t2.user
and t2.dttime =
( select min ( t3.dttime )
from table2 t3
where t1.pgm_id = t3.pgm_id
and t1.user = t3.user
and t1.dttime < t3.dttime )
order by did
;
quit
Try this to get the start and end date times. The code below assumes you want an equijoin (only rows that have matching pgm_id and user in both tables). If you want all records in table1, even if there is no matching observation in table2, we will need to use a left join instead.
You will also need to add logic to deal with calculating the time between the two dates . . . there are plenty of examples in these forums and elsewhere how to calculate intervals excluding holidays.
proc sql;
create table want as
select t1.did
, t1.pgm_id
, t1.user
, t1.dttime as start_dtm
, t2.dttime as end_dtm
from table1 t1
, table2 t2
where t1.pgm_id = t2.pgm_id
and t1.user = t2.user
and t2.dttime =
( select min ( t3.dttime )
from table2 t3
where t1.pgm_id = t3.pgm_id
and t1.user = t3.user
and t1.dttime < t3.dttime )
order by did
;
quit
I thought of doing it that way but decided that it wouldn't work and I didn't even try it. But it does work and it is much simpler than working with loops. Thanks Fugue.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.