BookmarkSubscribeRSS Feed
rhnam
Calcite | Level 5

I'm posting here for the first time so forgive me if I doing it incorrectly.

I'm a new sas user and am having trouble figuring out how to calculate a variable using a do loop.

The data I'm looking at is household vehicle trip activity. The variables include Trip ID, Trip start time, Trip end time, and Household ID.

What I'm trying to figure out is whether a trip's time overlaps with the other trip times of the same household. Here's an example of what I'm looking at.

Trip IDTrip Start TimeTrip End TimeHousehold ID
18:0017:00999
29:309:45999
310:0018:00999

My goal is to calculate the number of overlapping trips per household. In the final data-set each observation represents an unique household and the variable is the number of overlapping trips (in the case of the table above, 2)

Does anyone know how I can use a do loop to calculate whether  the end time in one trip (say trip 1) is greater than the start time in the other trips (trip 2 and trip 3)?

I've never used a do loop before so any help would be greatly appreciated.

I tried the syntax below but it didn't work the way I wanted it to. Plus it doesn't capture households  with more than 2 trips.

data travel_2

set travel

by houseid

overlap= .;

if (first.endtime>last.starttime) then overlap=1;

else overlap= 0;

run;

14 REPLIES 14
Haikuo
Onyx | Level 15

Question:

Is overlap a boolean variable or can be accumulated if more than one overlap within the group?

Haikuo

rhnam
Calcite | Level 5

Haikuo,

Yes "overlap" should be a boolean variable. (1/0).

Richard

Haikuo
Onyx | Level 15

Since you have not laid out what your final output look like, this is just based on my guessing, and it can be easily tweaked:

data have;

input TID$ ST :time8. ET :time8. HID$;

format ST :time8. ET :time8.;

cards;

1 8:00 17:00 999

2 9:30 9:45 999

3 10:00 18:00 999

1 2:00 10:00 100

2 11:00 11:30 100

3 12:00 13:00 100

4 12:30 17:30 100

;

data want(drop=tid st et rc rename=(_tid=Trip_ID _st=Trip_start_time _et=Trip_end_time));

  if _n_=1 then do;

  if 0 then set have;

dcl hash h(dataset:'have', multidata:'y');

h.definekey('hid');

h.definedata(all:'y');

h.definedone();

  end;

  overlap=0;

  set have (rename=(tid=_tid st=_st et=_et));

  do rc=h.find() by 0 while (rc=0);

  if _tid ne tid and (_st<=st<=_et or _st<=et<=_et or (st>=_st and et<=_et) or (_st>=st and _et<=et)) then overlap+1;

if overlap>0 then leave;

rc=h.find_next();

end;

run;

proc print;run;

Haikuo

Reeza
Super User

Here's an attempt, not sure it'll catch all the scenario's and not sure what you're looking for overall...

*Input your sample data;

data have;

    input tripid start_time time5. end_time time5.;

    format start_time end_time time5.;

    household_id=999;

    cards;

1 8:00 17:00

2 9:30 9:45

3 10:00 18:00

;

run;

*Find all overlapping trips

proc sql;

    create table want as

    select a.*, b.start_time as overlap_start, b.end_time as overlap_end, b.tripid as overlap_id

    from have a

    left join have b

    on (b.start_time between a.start_time and a.end_time)

    and a.tripid ne b.tripid and a.household_id=b.household_id;

quit;

*Count the number of overlaps per trip/household;

proc sql;

    create table overlaps as

    select household_id, tripid, count(overlap_id) as overlaps

    from want

    group by household_id, tripid;

    *where overlap_start ne .;

quit;

Haikuo
Onyx | Level 15

Reeza,

on (b.start_time between a.start_time and a.end_time or  b.end_time between a.start_time and a.end_time ) seems to catch more.

Regards,

Haikuo

Reeza
Super User

You'll get duplicates then (ie Trip 1 overlaps with trip 3 and Trip 3 overlaps with Trip 1).

Haikuo
Onyx | Level 15

Yes, Reeza. I just realized that our differences were the results from  different understanding towards OP's intention. if treating 'household_id' as the unit to count, your solution worked perfectly; while if treating each individual trip_id as unit to count, then my concern would make sense.

Haikuo

Ksharp
Super User
data have;
input TID$ ST :time8. ET :time8. HID$;
format ST :time8. ET :time8.;
cards;
1 8:00 17:00 999
2 9:30 9:45 999
3 10:00 18:00 999
1 2:00 10:00 100
2 11:00 11:30 100
3 12:00 13:00 100
4 12:30 17:30 100
;
run;
proc sql;
create table temp as
 select a.hid,a.tid,b.tid as _tid
  from have as a,have as b
   where a.hid=b.hid and a.tid ne b.tid and 
    (b.st between a.st and a.et or b.et between a.st and a.et )
   order by hid;
quit;
data want(keep=hid count string);
 set temp;
 by hid;
 length string $ 800 ;
 retain string ;
 if not findw(string,strip(tid)) then string=catx(' ',string,tid);
 if not findw(string,strip(_tid)) then string=catx(' ',string,_tid);
 if last.hid then do;
                    count=countw(string);
                    output;
                    call missing(string);
                  end;
run;

Ksharp

rkdubey84
Obsidian | Level 7

Hi...I need a way to calculate Overlap minutes with respect to US Trading Hours 9:30am to 4:00pm.

-----------------------9:30---------------------------------16:00-----------------

Scenario 1: 7:30--------:**************12:30

Scenario 2:                            12:30******************:----18:00

Scenario 3:                     11:00**************14:00

Scenario 4: 7:30--------:*************************************:----18:00

Scenario 5: ------------:********10:00                                  2:00-----

I guess only these 5 scenarios (Different Country Stock Exchange Trading Times) are possible of an overlap. Scenarios Represent Opening and Closing time for various exchanges whereas 9:30 to 16:00 is the US time.

An extra column for boolean if Overlaps or not can also be added.

You may take this as the Input Dataset

Input Data:

ID     Beg_Time      End_time    USBegTime    USEndTime   Overlap  IsOverlap

SE1      7:30          12:30               9:30              16:00          ?           1 (for yes)

SE2      12:30         18:00              9:30              16:00          ?

SE3      11:00         14:00              9:30              16:00          ?

SE4      7:30          18:00               9:30              16:00         ?

SE5      2:00          10:00               9:30              16:00         ?

SE6      1:00          7:00                 9:30              16:00         ?             0 (For No.)

Please Help ASAP.

Thanks in Advance

Reeza
Super User

Assuming your time is specified correctly and consistently the following should work.

data have;

informat beg_time end_time usbegtime usendtime time8.;

format beg_time end_time usbegtime usendtime time8.;

input ID  $   Beg_Time      End_time    USBegTime    USEndTime   ;

cards;

SE1  7:30 12:30  9:30 16:00

SE2 12:30 18:00  9:30 16:00

SE3 11:00 14:00  9:30 16:00

SE4  7:30 18:00  9:30 16:00

SE5  2:00 10:00  9:30 16:00

SE6  1:00  7:00  9:30 16:00

;

run;

data want;

set have;

format overlap_start overlap_end overlap time8.;

if  usbegtime < beg_time <usendtime

    or usbegtime < End_time <usendtime 

    or beg_time < usbegtime <end_time

    or beg_time < usendtime < end_time then overlap=1;

if overlap=1 then do;

    overlap_start=max(usbegtime, beg_time);

    overlap_end=min(usendtime, end_time);

end;

overlap=overlap_end-overlap_start;

run;

rkdubey84
Obsidian | Level 7

Thanks a ton for the last reply. I am stuck at a new prb. I have two datasets as follows

Dataset A

ID     Exchange

A      NYSE

B      NASDAQ

C      BSE

D      NSE

Dataset B

ID    Open     Close

A      10         20

B       15        20

C       30        40

Dataset A contains more number of IDs than Dataset B. I wanted to Add the Column Exchange in Dataset B So that Final Output is like:

ID    Open     Close Exchange

A      10         20     NYSE

B       15        20     NASDAQ

C       30        40     BSE

When I use Merge statement, It automatically adds the ID values D and E as well. How to stop that and solve this prb.

Looking for your help.

Thanks.

Ritesh

Tom
Super User Tom
Super User

Use the IN dataset option.

data want;

  merge b(in=in1) a(in=in2);

by id;

if in1 and in2;

run;

rkdubey84
Obsidian | Level 7

Thanks Tom, for the reply. I tried it, but it reduces my data-set significantly. Am I doing something wrong here: I am new to the IN Option Use.

Thanks.

183  data taqsum.match123;

184    merge taqsum.match(in=in1) taqsum.symbol_homeexchange(in=in2);

185  by symbol;

186  if in1 and in2;

187  run;

NOTE: There were 169 observations read from the data set TAQSUM.MATCH.

NOTE: There were 463 observations read from the data set TAQSUM.SYMBOL_HOMEEXCHANGE.

NOTE: The data set TAQSUM.MATCH123 has 5 observations and 90 variables.

NOTE: DATA statement used (Total process time):

      real time           0.05 seconds

      cpu time            0.03 seconds

rkdubey84
Obsidian | Level 7

Thanks Tom, I used:

data want;

  merge b(in=in1) a(in=in2);

by id;

if in2;

run;

And it did the way I wanted. Thanks a lot again. Smiley Happy

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 5216 views
  • 0 likes
  • 6 in conversation