DATA Step, Macro, Functions and more

Do loops within a group

Reply
New Contributor
Posts: 2

Do loops within a group

I'm posting here for the first time so forgive me if I doing it incorrectly.

I'm a new sas user and am having trouble figuring out how to calculate a variable using a do loop.

The data I'm looking at is household vehicle trip activity. The variables include Trip ID, Trip start time, Trip end time, and Household ID.

What I'm trying to figure out is whether a trip's time overlaps with the other trip times of the same household. Here's an example of what I'm looking at.

Trip IDTrip Start TimeTrip End TimeHousehold ID
18:0017:00999
29:309:45999
310:0018:00999

My goal is to calculate the number of overlapping trips per household. In the final data-set each observation represents an unique household and the variable is the number of overlapping trips (in the case of the table above, 2)

Does anyone know how I can use a do loop to calculate whether  the end time in one trip (say trip 1) is greater than the start time in the other trips (trip 2 and trip 3)?

I've never used a do loop before so any help would be greatly appreciated.

I tried the syntax below but it didn't work the way I wanted it to. Plus it doesn't capture households  with more than 2 trips.

data travel_2

set travel

by houseid

overlap= .;

if (first.endtime>last.starttime) then overlap=1;

else overlap= 0;

run;

Respected Advisor
Posts: 3,156

Re: Do loops within a group

Question:

Is overlap a boolean variable or can be accumulated if more than one overlap within the group?

Haikuo

New Contributor
Posts: 2

Re: Do loops within a group

Haikuo,

Yes "overlap" should be a boolean variable. (1/0).

Richard

Respected Advisor
Posts: 3,156

Re: Do loops within a group

Since you have not laid out what your final output look like, this is just based on my guessing, and it can be easily tweaked:

data have;

input TID$ ST :time8. ET :time8. HID$;

format ST :time8. ET :time8.;

cards;

1 8:00 17:00 999

2 9:30 9:45 999

3 10:00 18:00 999

1 2:00 10:00 100

2 11:00 11:30 100

3 12:00 13:00 100

4 12:30 17:30 100

;

data want(drop=tid st et rc rename=(_tid=Trip_ID _st=Trip_start_time _et=Trip_end_time));

  if _n_=1 then do;

  if 0 then set have;

dcl hash h(dataset:'have', multidata:'y');

h.definekey('hid');

h.definedata(all:'y');

h.definedone();

  end;

  overlap=0;

  set have (rename=(tid=_tid st=_st et=_et));

  do rc=h.find() by 0 while (rc=0);

  if _tid ne tid and (_st<=st<=_et or _st<=et<=_et or (st>=_st and et<=_et) or (_st>=st and _et<=et)) then overlap+1;

if overlap>0 then leave;

rc=h.find_next();

end;

run;

proc print;run;

Haikuo

Super User
Posts: 19,805

Re: Do loops within a group

Here's an attempt, not sure it'll catch all the scenario's and not sure what you're looking for overall...

*Input your sample data;

data have;

    input tripid start_time time5. end_time time5.;

    format start_time end_time time5.;

    household_id=999;

    cards;

1 8:00 17:00

2 9:30 9:45

3 10:00 18:00

;

run;

*Find all overlapping trips

proc sql;

    create table want as

    select a.*, b.start_time as overlap_start, b.end_time as overlap_end, b.tripid as overlap_id

    from have a

    left join have b

    on (b.start_time between a.start_time and a.end_time)

    and a.tripid ne b.tripid and a.household_id=b.household_id;

quit;

*Count the number of overlaps per trip/household;

proc sql;

    create table overlaps as

    select household_id, tripid, count(overlap_id) as overlaps

    from want

    group by household_id, tripid;

    *where overlap_start ne .;

quit;

Respected Advisor
Posts: 3,156

Re: Do loops within a group

Reeza,

on (b.start_time between a.start_time and a.end_time or  b.end_time between a.start_time and a.end_time ) seems to catch more.

Regards,

Haikuo

Super User
Posts: 19,805

Re: Do loops within a group

You'll get duplicates then (ie Trip 1 overlaps with trip 3 and Trip 3 overlaps with Trip 1).

Respected Advisor
Posts: 3,156

Re: Do loops within a group

Yes, Reeza. I just realized that our differences were the results from  different understanding towards OP's intention. if treating 'household_id' as the unit to count, your solution worked perfectly; while if treating each individual trip_id as unit to count, then my concern would make sense.

Haikuo

Super User
Posts: 10,028

Re: Do loops within a group

data have;
input TID$ ST :time8. ET :time8. HID$;
format ST :time8. ET :time8.;
cards;
1 8:00 17:00 999
2 9:30 9:45 999
3 10:00 18:00 999
1 2:00 10:00 100
2 11:00 11:30 100
3 12:00 13:00 100
4 12:30 17:30 100
;
run;
proc sql;
create table temp as
 select a.hid,a.tid,b.tid as _tid
  from have as a,have as b
   where a.hid=b.hid and a.tid ne b.tid and 
    (b.st between a.st and a.et or b.et between a.st and a.et )
   order by hid;
quit;
data want(keep=hid count string);
 set temp;
 by hid;
 length string $ 800 ;
 retain string ;
 if not findw(string,strip(tid)) then string=catx(' ',string,tid);
 if not findw(string,strip(_tid)) then string=catx(' ',string,_tid);
 if last.hid then do;
                    count=countw(string);
                    output;
                    call missing(string);
                  end;
run;

Ksharp

Contributor
Posts: 36

Re: Do loops within a group

Hi...I need a way to calculate Overlap minutes with respect to US Trading Hours 9:30am to 4:00pm.

-----------------------9:30---------------------------------16:00-----------------

Scenario 1: 7:30--------:**************12:30

Scenario 2:                            12:30******************:----18:00

Scenario 3:                     11:00**************14:00

Scenario 4: 7:30--------:*************************************:----18:00

Scenario 5: ------------:********10:00                                  2:00-----

I guess only these 5 scenarios (Different Country Stock Exchange Trading Times) are possible of an overlap. Scenarios Represent Opening and Closing time for various exchanges whereas 9:30 to 16:00 is the US time.

An extra column for boolean if Overlaps or not can also be added.

You may take this as the Input Dataset

Input Data:

ID     Beg_Time      End_time    USBegTime    USEndTime   Overlap  IsOverlap

SE1      7:30          12:30               9:30              16:00          ?           1 (for yes)

SE2      12:30         18:00              9:30              16:00          ?

SE3      11:00         14:00              9:30              16:00          ?

SE4      7:30          18:00               9:30              16:00         ?

SE5      2:00          10:00               9:30              16:00         ?

SE6      1:00          7:00                 9:30              16:00         ?             0 (For No.)

Please Help ASAP.

Thanks in Advance

Super User
Posts: 19,805

Re: Do loops within a group

Posted in reply to rkdubey84

Assuming your time is specified correctly and consistently the following should work.

data have;

informat beg_time end_time usbegtime usendtime time8.;

format beg_time end_time usbegtime usendtime time8.;

input ID  $   Beg_Time      End_time    USBegTime    USEndTime   ;

cards;

SE1  7:30 12:30  9:30 16:00

SE2 12:30 18:00  9:30 16:00

SE3 11:00 14:00  9:30 16:00

SE4  7:30 18:00  9:30 16:00

SE5  2:00 10:00  9:30 16:00

SE6  1:00  7:00  9:30 16:00

;

run;

data want;

set have;

format overlap_start overlap_end overlap time8.;

if  usbegtime < beg_time <usendtime

    or usbegtime < End_time <usendtime 

    or beg_time < usbegtime <end_time

    or beg_time < usendtime < end_time then overlap=1;

if overlap=1 then do;

    overlap_start=max(usbegtime, beg_time);

    overlap_end=min(usendtime, end_time);

end;

overlap=overlap_end-overlap_start;

run;

Contributor
Posts: 36

Re: Do loops within a group

Thanks a ton for the last reply. I am stuck at a new prb. I have two datasets as follows

Dataset A

ID     Exchange

A      NYSE

B      NASDAQ

C      BSE

D      NSE

Dataset B

ID    Open     Close

A      10         20

B       15        20

C       30        40

Dataset A contains more number of IDs than Dataset B. I wanted to Add the Column Exchange in Dataset B So that Final Output is like:

ID    Open     Close Exchange

A      10         20     NYSE

B       15        20     NASDAQ

C       30        40     BSE

When I use Merge statement, It automatically adds the ID values D and E as well. How to stop that and solve this prb.

Looking for your help.

Thanks.

Ritesh

Super User
Super User
Posts: 7,050

Re: Do loops within a group

Posted in reply to rkdubey84

Use the IN dataset option.

data want;

  merge b(in=in1) a(in=in2);

by id;

if in1 and in2;

run;

Contributor
Posts: 36

Re: Do loops within a group

Thanks Tom, for the reply. I tried it, but it reduces my data-set significantly. Am I doing something wrong here: I am new to the IN Option Use.

Thanks.

183  data taqsum.match123;

184    merge taqsum.match(in=in1) taqsum.symbol_homeexchange(in=in2);

185  by symbol;

186  if in1 and in2;

187  run;

NOTE: There were 169 observations read from the data set TAQSUM.MATCH.

NOTE: There were 463 observations read from the data set TAQSUM.SYMBOL_HOMEEXCHANGE.

NOTE: The data set TAQSUM.MATCH123 has 5 observations and 90 variables.

NOTE: DATA statement used (Total process time):

      real time           0.05 seconds

      cpu time            0.03 seconds

Contributor
Posts: 36

Re: Do loops within a group

Posted in reply to rkdubey84

Thanks Tom, I used:

data want;

  merge b(in=in1) a(in=in2);

by id;

if in2;

run;

And it did the way I wanted. Thanks a lot again. Smiley Happy

Ask a Question
Discussion stats
  • 14 replies
  • 895 views
  • 0 likes
  • 6 in conversation