I'm posting here for the first time so forgive me if I doing it incorrectly.
I'm a new sas user and am having trouble figuring out how to calculate a variable using a do loop.
The data I'm looking at is household vehicle trip activity. The variables include Trip ID, Trip start time, Trip end time, and Household ID.
What I'm trying to figure out is whether a trip's time overlaps with the other trip times of the same household. Here's an example of what I'm looking at.
Trip ID | Trip Start Time | Trip End Time | Household ID |
---|---|---|---|
1 | 8:00 | 17:00 | 999 |
2 | 9:30 | 9:45 | 999 |
3 | 10:00 | 18:00 | 999 |
My goal is to calculate the number of overlapping trips per household. In the final data-set each observation represents an unique household and the variable is the number of overlapping trips (in the case of the table above, 2)
Does anyone know how I can use a do loop to calculate whether the end time in one trip (say trip 1) is greater than the start time in the other trips (trip 2 and trip 3)?
I've never used a do loop before so any help would be greatly appreciated.
I tried the syntax below but it didn't work the way I wanted it to. Plus it doesn't capture households with more than 2 trips.
data travel_2
set travel
by houseid
overlap= .;
if (first.endtime>last.starttime) then overlap=1;
else overlap= 0;
run;
Question:
Is overlap a boolean variable or can be accumulated if more than one overlap within the group?
Haikuo
Haikuo,
Yes "overlap" should be a boolean variable. (1/0).
Richard
Since you have not laid out what your final output look like, this is just based on my guessing, and it can be easily tweaked:
data have;
input TID$ ST :time8. ET :time8. HID$;
format ST :time8. ET :time8.;
cards;
1 8:00 17:00 999
2 9:30 9:45 999
3 10:00 18:00 999
1 2:00 10:00 100
2 11:00 11:30 100
3 12:00 13:00 100
4 12:30 17:30 100
;
data want(drop=tid st et rc rename=(_tid=Trip_ID _st=Trip_start_time _et=Trip_end_time));
if _n_=1 then do;
if 0 then set have;
dcl hash h(dataset:'have', multidata:'y');
h.definekey('hid');
h.definedata(all:'y');
h.definedone();
end;
overlap=0;
set have (rename=(tid=_tid st=_st et=_et));
do rc=h.find() by 0 while (rc=0);
if _tid ne tid and (_st<=st<=_et or _st<=et<=_et or (st>=_st and et<=_et) or (_st>=st and _et<=et)) then overlap+1;
if overlap>0 then leave;
rc=h.find_next();
end;
run;
proc print;run;
Haikuo
Here's an attempt, not sure it'll catch all the scenario's and not sure what you're looking for overall...
*Input your sample data;
data have;
input tripid start_time time5. end_time time5.;
format start_time end_time time5.;
household_id=999;
cards;
1 8:00 17:00
2 9:30 9:45
3 10:00 18:00
;
run;
*Find all overlapping trips
proc sql;
create table want as
select a.*, b.start_time as overlap_start, b.end_time as overlap_end, b.tripid as overlap_id
from have a
left join have b
on (b.start_time between a.start_time and a.end_time)
and a.tripid ne b.tripid and a.household_id=b.household_id;
quit;
*Count the number of overlaps per trip/household;
proc sql;
create table overlaps as
select household_id, tripid, count(overlap_id) as overlaps
from want
group by household_id, tripid;
*where overlap_start ne .;
quit;
Reeza,
on (b.start_time between a.start_time and a.end_time or b.end_time between a.start_time and a.end_time ) seems to catch more.
Regards,
Haikuo
You'll get duplicates then (ie Trip 1 overlaps with trip 3 and Trip 3 overlaps with Trip 1).
Yes, Reeza. I just realized that our differences were the results from different understanding towards OP's intention. if treating 'household_id' as the unit to count, your solution worked perfectly; while if treating each individual trip_id as unit to count, then my concern would make sense.
Haikuo
data have; input TID$ ST :time8. ET :time8. HID$; format ST :time8. ET :time8.; cards; 1 8:00 17:00 999 2 9:30 9:45 999 3 10:00 18:00 999 1 2:00 10:00 100 2 11:00 11:30 100 3 12:00 13:00 100 4 12:30 17:30 100 ; run; proc sql; create table temp as select a.hid,a.tid,b.tid as _tid from have as a,have as b where a.hid=b.hid and a.tid ne b.tid and (b.st between a.st and a.et or b.et between a.st and a.et ) order by hid; quit; data want(keep=hid count string); set temp; by hid; length string $ 800 ; retain string ; if not findw(string,strip(tid)) then string=catx(' ',string,tid); if not findw(string,strip(_tid)) then string=catx(' ',string,_tid); if last.hid then do; count=countw(string); output; call missing(string); end; run;
Ksharp
Hi...I need a way to calculate Overlap minutes with respect to US Trading Hours 9:30am to 4:00pm.
-----------------------9:30---------------------------------16:00-----------------
Scenario 1: 7:30--------:**************12:30
Scenario 2: 12:30******************:----18:00
Scenario 3: 11:00**************14:00
Scenario 4: 7:30--------:*************************************:----18:00
Scenario 5: ------------:********10:00 2:00-----
I guess only these 5 scenarios (Different Country Stock Exchange Trading Times) are possible of an overlap. Scenarios Represent Opening and Closing time for various exchanges whereas 9:30 to 16:00 is the US time.
An extra column for boolean if Overlaps or not can also be added.
You may take this as the Input Dataset
Input Data:
ID Beg_Time End_time USBegTime USEndTime Overlap IsOverlap
SE1 7:30 12:30 9:30 16:00 ? 1 (for yes)
SE2 12:30 18:00 9:30 16:00 ?
SE3 11:00 14:00 9:30 16:00 ?
SE4 7:30 18:00 9:30 16:00 ?
SE5 2:00 10:00 9:30 16:00 ?
SE6 1:00 7:00 9:30 16:00 ? 0 (For No.)
Please Help ASAP.
Thanks in Advance
Assuming your time is specified correctly and consistently the following should work.
data have;
informat beg_time end_time usbegtime usendtime time8.;
format beg_time end_time usbegtime usendtime time8.;
input ID $ Beg_Time End_time USBegTime USEndTime ;
cards;
SE1 7:30 12:30 9:30 16:00
SE2 12:30 18:00 9:30 16:00
SE3 11:00 14:00 9:30 16:00
SE4 7:30 18:00 9:30 16:00
SE5 2:00 10:00 9:30 16:00
SE6 1:00 7:00 9:30 16:00
;
run;
data want;
set have;
format overlap_start overlap_end overlap time8.;
if usbegtime < beg_time <usendtime
or usbegtime < End_time <usendtime
or beg_time < usbegtime <end_time
or beg_time < usendtime < end_time then overlap=1;
if overlap=1 then do;
overlap_start=max(usbegtime, beg_time);
overlap_end=min(usendtime, end_time);
end;
overlap=overlap_end-overlap_start;
run;
Thanks a ton for the last reply. I am stuck at a new prb. I have two datasets as follows
Dataset A
ID Exchange
A NYSE
B NASDAQ
C BSE
D NSE
Dataset B
ID Open Close
A 10 20
B 15 20
C 30 40
Dataset A contains more number of IDs than Dataset B. I wanted to Add the Column Exchange in Dataset B So that Final Output is like:
ID Open Close Exchange
A 10 20 NYSE
B 15 20 NASDAQ
C 30 40 BSE
When I use Merge statement, It automatically adds the ID values D and E as well. How to stop that and solve this prb.
Looking for your help.
Thanks.
Ritesh
Use the IN dataset option.
data want;
merge b(in=in1) a(in=in2);
by id;
if in1 and in2;
run;
Thanks Tom, for the reply. I tried it, but it reduces my data-set significantly. Am I doing something wrong here: I am new to the IN Option Use.
Thanks.
183 data taqsum.match123;
184 merge taqsum.match(in=in1) taqsum.symbol_homeexchange(in=in2);
185 by symbol;
186 if in1 and in2;
187 run;
NOTE: There were 169 observations read from the data set TAQSUM.MATCH.
NOTE: There were 463 observations read from the data set TAQSUM.SYMBOL_HOMEEXCHANGE.
NOTE: The data set TAQSUM.MATCH123 has 5 observations and 90 variables.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.03 seconds
Thanks Tom, I used:
data want;
merge b(in=in1) a(in=in2);
by id;
if in2;
run;
And it did the way I wanted. Thanks a lot again.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.