04-11-2012 12:00 PM
I'm posting here for the first time so forgive me if I doing it incorrectly.
I'm a new sas user and am having trouble figuring out how to calculate a variable using a do loop.
The data I'm looking at is household vehicle trip activity. The variables include Trip ID, Trip start time, Trip end time, and Household ID.
What I'm trying to figure out is whether a trip's time overlaps with the other trip times of the same household. Here's an example of what I'm looking at.
|Trip ID||Trip Start Time||Trip End Time||Household ID|
My goal is to calculate the number of overlapping trips per household. In the final data-set each observation represents an unique household and the variable is the number of overlapping trips (in the case of the table above, 2)
Does anyone know how I can use a do loop to calculate whether the end time in one trip (say trip 1) is greater than the start time in the other trips (trip 2 and trip 3)?
I've never used a do loop before so any help would be greatly appreciated.
I tried the syntax below but it didn't work the way I wanted it to. Plus it doesn't capture households with more than 2 trips.
if (first.endtime>last.starttime) then overlap=1;
else overlap= 0;
04-11-2012 02:05 PM
Since you have not laid out what your final output look like, this is just based on my guessing, and it can be easily tweaked:
input TID$ ST :time8. ET :time8. HID$;
format ST :time8. ET :time8.;
1 8:00 17:00 999
2 9:30 9:45 999
3 10:00 18:00 999
1 2:00 10:00 100
2 11:00 11:30 100
3 12:00 13:00 100
4 12:30 17:30 100
data want(drop=tid st et rc rename=(_tid=Trip_ID _st=Trip_start_time _et=Trip_end_time));
if _n_=1 then do;
if 0 then set have;
dcl hash h(dataset:'have', multidata:'y');
set have (rename=(tid=_tid st=_st et=_et));
do rc=h.find() by 0 while (rc=0);
if _tid ne tid and (_st<=st<=_et or _st<=et<=_et or (st>=_st and et<=_et) or (_st>=st and _et<=et)) then overlap+1;
if overlap>0 then leave;
04-11-2012 12:58 PM
Here's an attempt, not sure it'll catch all the scenario's and not sure what you're looking for overall...
*Input your sample data;
input tripid start_time time5. end_time time5.;
format start_time end_time time5.;
1 8:00 17:00
2 9:30 9:45
3 10:00 18:00
*Find all overlapping trips
create table want as
select a.*, b.start_time as overlap_start, b.end_time as overlap_end, b.tripid as overlap_id
from have a
left join have b
on (b.start_time between a.start_time and a.end_time)
and a.tripid ne b.tripid and a.household_id=b.household_id;
*Count the number of overlaps per trip/household;
create table overlaps as
select household_id, tripid, count(overlap_id) as overlaps
group by household_id, tripid;
*where overlap_start ne .;
04-11-2012 03:20 PM
Yes, Reeza. I just realized that our differences were the results from different understanding towards OP's intention. if treating 'household_id' as the unit to count, your solution worked perfectly; while if treating each individual trip_id as unit to count, then my concern would make sense.
04-12-2012 04:08 AM
data have; input TID$ ST :time8. ET :time8. HID$; format ST :time8. ET :time8.; cards; 1 8:00 17:00 999 2 9:30 9:45 999 3 10:00 18:00 999 1 2:00 10:00 100 2 11:00 11:30 100 3 12:00 13:00 100 4 12:30 17:30 100 ; run; proc sql; create table temp as select a.hid,a.tid,b.tid as _tid from have as a,have as b where a.hid=b.hid and a.tid ne b.tid and (b.st between a.st and a.et or b.et between a.st and a.et ) order by hid; quit; data want(keep=hid count string); set temp; by hid; length string $ 800 ; retain string ; if not findw(string,strip(tid)) then string=catx(' ',string,tid); if not findw(string,strip(_tid)) then string=catx(' ',string,_tid); if last.hid then do; count=countw(string); output; call missing(string); end; run;
02-11-2013 05:25 PM
Hi...I need a way to calculate Overlap minutes with respect to US Trading Hours 9:30am to 4:00pm.
Scenario 1: 7:30--------:**************12:30
Scenario 2: 12:30******************:----18:00
Scenario 3: 11:00**************14:00
Scenario 4: 7:30--------:*************************************:----18:00
Scenario 5: ------------:********10:00 2:00-----
I guess only these 5 scenarios (Different Country Stock Exchange Trading Times) are possible of an overlap. Scenarios Represent Opening and Closing time for various exchanges whereas 9:30 to 16:00 is the US time.
An extra column for boolean if Overlaps or not can also be added.
You may take this as the Input Dataset
ID Beg_Time End_time USBegTime USEndTime Overlap IsOverlap
SE1 7:30 12:30 9:30 16:00 ? 1 (for yes)
SE2 12:30 18:00 9:30 16:00 ?
SE3 11:00 14:00 9:30 16:00 ?
SE4 7:30 18:00 9:30 16:00 ?
SE5 2:00 10:00 9:30 16:00 ?
SE6 1:00 7:00 9:30 16:00 ? 0 (For No.)
Please Help ASAP.
Thanks in Advance
02-11-2013 06:20 PM
Assuming your time is specified correctly and consistently the following should work.
informat beg_time end_time usbegtime usendtime time8.;
format beg_time end_time usbegtime usendtime time8.;
input ID $ Beg_Time End_time USBegTime USEndTime ;
SE1 7:30 12:30 9:30 16:00
SE2 12:30 18:00 9:30 16:00
SE3 11:00 14:00 9:30 16:00
SE4 7:30 18:00 9:30 16:00
SE5 2:00 10:00 9:30 16:00
SE6 1:00 7:00 9:30 16:00
format overlap_start overlap_end overlap time8.;
if usbegtime < beg_time <usendtime
or usbegtime < End_time <usendtime
or beg_time < usbegtime <end_time
or beg_time < usendtime < end_time then overlap=1;
if overlap=1 then do;
02-16-2013 12:23 PM
Thanks a ton for the last reply. I am stuck at a new prb. I have two datasets as follows
ID Open Close
A 10 20
B 15 20
C 30 40
Dataset A contains more number of IDs than Dataset B. I wanted to Add the Column Exchange in Dataset B So that Final Output is like:
ID Open Close Exchange
A 10 20 NYSE
B 15 20 NASDAQ
C 30 40 BSE
When I use Merge statement, It automatically adds the ID values D and E as well. How to stop that and solve this prb.
Looking for your help.
02-16-2013 12:47 PM
Thanks Tom, for the reply. I tried it, but it reduces my data-set significantly. Am I doing something wrong here: I am new to the IN Option Use.
183 data taqsum.match123;
184 merge taqsum.match(in=in1) taqsum.symbol_homeexchange(in=in2);
185 by symbol;
186 if in1 and in2;
NOTE: There were 169 observations read from the data set TAQSUM.MATCH.
NOTE: There were 463 observations read from the data set TAQSUM.SYMBOL_HOMEEXCHANGE.
NOTE: The data set TAQSUM.MATCH123 has 5 observations and 90 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.03 seconds