Re: Do loops within a group

rhnam · Posted 04-11-2012 12:00 PM

I'm posting here for the first time so forgive me if I doing it incorrectly.

I'm a new sas user and am having trouble figuring out how to calculate a variable using a do loop.

The data I'm looking at is household vehicle trip activity. The variables include Trip ID, Trip start time, Trip end time, and Household ID.

What I'm trying to figure out is whether a trip's time overlaps with the other trip times of the same household. Here's an example of what I'm looking at.

Trip ID	Trip Start Time	Trip End Time	Household ID
1	8:00	17:00	999
2	9:30	9:45	999
3	10:00	18:00	999

My goal is to calculate the number of overlapping trips per household. In the final data-set each observation represents an unique household and the variable is the number of overlapping trips (in the case of the table above, 2)

Does anyone know how I can use a do loop to calculate whether the end time in one trip (say trip 1) is greater than the start time in the other trips (trip 2 and trip 3)?

I've never used a do loop before so any help would be greatly appreciated.

I tried the syntax below but it didn't work the way I wanted it to. Plus it doesn't capture households with more than 2 trips.

data travel_2

set travel

by houseid

overlap= .;

if (first.endtime>last.starttime) then overlap=1;

else overlap= 0;

run;

Haikuo · Posted 04-11-2012 12:15 PM

Question:

Is overlap a boolean variable or can be accumulated if more than one overlap within the group?

Haikuo

rhnam · Posted 04-11-2012 01:37 PM

Haikuo,

Yes "overlap" should be a boolean variable. (1/0).

Richard

Haikuo · Posted 04-11-2012 02:05 PM

Since you have not laid out what your final output look like, this is just based on my guessing, and it can be easily tweaked:

data have;

input TID$ ST :time8. ET :time8. HID$;

format ST :time8. ET :time8.;

cards;

1 8:00 17:00 999

2 9:30 9:45 999

3 10:00 18:00 999

1 2:00 10:00 100

2 11:00 11:30 100

3 12:00 13:00 100

4 12:30 17:30 100

;

data want(drop=tid st et rc rename=(_tid=Trip_ID _st=Trip_start_time _et=Trip_end_time));

if _n_=1 then do;

if 0 then set have;

dcl hash h(dataset:'have', multidata:'y');

h.definekey('hid');

h.definedata(all:'y');

h.definedone();

end;

overlap=0;

set have (rename=(tid=_tid st=_st et=_et));

do rc=h.find() by 0 while (rc=0);

if _tid ne tid and (_st<=st<=_et or _st<=et<=_et or (st>=_st and et<=_et) or (_st>=st and _et<=et)) then overlap+1;

if overlap>0 then leave;

rc=h.find_next();

end;

run;

proc print;run;

Haikuo

Reeza · Posted 04-11-2012 12:58 PM

Here's an attempt, not sure it'll catch all the scenario's and not sure what you're looking for overall...

*Input your sample data;

data have;

input tripid start_time time5. end_time time5.;

format start_time end_time time5.;

household_id=999;

cards;

1 8:00 17:00

2 9:30 9:45

3 10:00 18:00

;

run;

*Find all overlapping trips

proc sql;

create table want as

select a.*, b.start_time as overlap_start, b.end_time as overlap_end, b.tripid as overlap_id

from have a

left join have b

on (b.start_time between a.start_time and a.end_time)

and a.tripid ne b.tripid and a.household_id=b.household_id;

quit;

*Count the number of overlaps per trip/household;

proc sql;

create table overlaps as

select household_id, tripid, count(overlap_id) as overlaps

from want

group by household_id, tripid;

*where overlap_start ne .;

quit;

Haikuo · Posted 04-11-2012 01:03 PM

Reeza,

on (b.start_time between a.start_time and a.end_time or b.end_time between a.start_time and a.end_time ) seems to catch more.

Regards,

Haikuo

Reeza · Posted 04-11-2012 02:04 PM

You'll get duplicates then (ie Trip 1 overlaps with trip 3 and Trip 3 overlaps with Trip 1).

Haikuo · Posted 04-11-2012 03:20 PM

Yes, Reeza. I just realized that our differences were the results from different understanding towards OP's intention. if treating 'household_id' as the unit to count, your solution worked perfectly; while if treating each individual trip_id as unit to count, then my concern would make sense.

Haikuo

Ksharp · Posted 04-12-2012 04:08 AM

data have;
input TID$ ST :time8. ET :time8. HID$;
format ST :time8. ET :time8.;
cards;
1 8:00 17:00 999
2 9:30 9:45 999
3 10:00 18:00 999
1 2:00 10:00 100
2 11:00 11:30 100
3 12:00 13:00 100
4 12:30 17:30 100
;
run;
proc sql;
create table temp as
 select a.hid,a.tid,b.tid as _tid
  from have as a,have as b
   where a.hid=b.hid and a.tid ne b.tid and 
    (b.st between a.st and a.et or b.et between a.st and a.et )
   order by hid;
quit;
data want(keep=hid count string);
 set temp;
 by hid;
 length string $ 800 ;
 retain string ;
 if not findw(string,strip(tid)) then string=catx(' ',string,tid);
 if not findw(string,strip(_tid)) then string=catx(' ',string,_tid);
 if last.hid then do;
                    count=countw(string);
                    output;
                    call missing(string);
                  end;
run;

Ksharp

rkdubey84 · Posted 02-11-2013 05:25 PM

Hi...I need a way to calculate Overlap minutes with respect to US Trading Hours 9:30am to 4:00pm.

-----------------------9:30---------------------------------16:00-----------------

Scenario 1: 7:30--------:**************12:30

Scenario 2: 12:30******************:----18:00

Scenario 3: 11:00**************14:00

Scenario 4: 7:30--------:*************************************:----18:00

Scenario 5: ------------:********10:00 2:00-----

I guess only these 5 scenarios (Different Country Stock Exchange Trading Times) are possible of an overlap. Scenarios Represent Opening and Closing time for various exchanges whereas 9:30 to 16:00 is the US time.

An extra column for boolean if Overlaps or not can also be added.

You may take this as the Input Dataset

Input Data:

ID Beg_Time End_time USBegTime USEndTime Overlap IsOverlap

SE1 7:30 12:30 9:30 16:00 ? 1 (for yes)

SE2 12:30 18:00 9:30 16:00 ?

SE3 11:00 14:00 9:30 16:00 ?

SE4 7:30 18:00 9:30 16:00 ?

SE5 2:00 10:00 9:30 16:00 ?

SE6 1:00 7:00 9:30 16:00 ? 0 (For No.)

Please Help ASAP.

Thanks in Advance

Reeza · Posted 02-11-2013 06:20 PM

Assuming your time is specified correctly and consistently the following should work.

data have;

informat beg_time end_time usbegtime usendtime time8.;

format beg_time end_time usbegtime usendtime time8.;

input ID $ Beg_Time End_time USBegTime USEndTime ;

cards;

SE1 7:30 12:30 9:30 16:00

SE2 12:30 18:00 9:30 16:00

SE3 11:00 14:00 9:30 16:00

SE4 7:30 18:00 9:30 16:00

SE5 2:00 10:00 9:30 16:00

SE6 1:00 7:00 9:30 16:00

;

run;

data want;

set have;

format overlap_start overlap_end overlap time8.;

if usbegtime < beg_time <usendtime

or usbegtime < End_time <usendtime

or beg_time < usbegtime <end_time

or beg_time < usendtime < end_time then overlap=1;

if overlap=1 then do;

overlap_start=max(usbegtime, beg_time);

overlap_end=min(usendtime, end_time);

end;

overlap=overlap_end-overlap_start;

run;

rkdubey84 · Posted 02-16-2013 12:23 PM

Thanks a ton for the last reply. I am stuck at a new prb. I have two datasets as follows

Dataset A

ID Exchange

A NYSE

B NASDAQ

C BSE

D NSE

Dataset B

ID Open Close

A 10 20

B 15 20

C 30 40

Dataset A contains more number of IDs than Dataset B. I wanted to Add the Column Exchange in Dataset B So that Final Output is like:

ID Open Close Exchange

A 10 20 NYSE

B 15 20 NASDAQ

C 30 40 BSE

When I use Merge statement, It automatically adds the ID values D and E as well. How to stop that and solve this prb.

Looking for your help.

Thanks.

Ritesh

Tom · Posted 02-16-2013 12:32 PM

Use the IN dataset option.

data want;

merge b(in=in1) a(in=in2);

by id;

if in1 and in2;

run;

rkdubey84 · Posted 02-16-2013 12:47 PM

Thanks Tom, for the reply. I tried it, but it reduces my data-set significantly. Am I doing something wrong here: I am new to the IN Option Use.

Thanks.

183 data taqsum.match123;

184 merge taqsum.match(in=in1) taqsum.symbol_homeexchange(in=in2);

185 by symbol;

186 if in1 and in2;

187 run;

NOTE: There were 169 observations read from the data set TAQSUM.MATCH.

NOTE: There were 463 observations read from the data set TAQSUM.SYMBOL_HOMEEXCHANGE.

NOTE: The data set TAQSUM.MATCH123 has 5 observations and 90 variables.

NOTE: DATA statement used (Total process time):

real time 0.05 seconds

cpu time 0.03 seconds

rkdubey84 · Posted 02-16-2013 01:03 PM

Thanks Tom, I used:

data want;

merge b(in=in1) a(in=in2);

by id;

if in2;

run;

And it did the way I wanted. Thanks a lot again.

Classroom Training Available!