Solved
New Contributor
Posts: 3

# Conditional Count

Hello,

I was wondering if there is a way to count observations based on parameters given in the current observation. To me this sounds like a macro, however I'm not too familiar with those and maybe you guys know how to do it in a data step or proc sql.

My data looks like this:

 FL_DATE CARRIER FL_NUM ORIGIN DEST crs_dep_time crs_arr_time min_connex_time max_connex_time 2012-01-01 AA 2400 LAX DFW 0:20 5:05 6:05 10:05 2012-01-01 AA 2043 SFO DFW 0:25 5:35 6:35 10:35 2012-01-01 AA 2538 LAS ORD 0:45 6:15 7:15 11:15 2012-01-01 AA 2297 JFK MIA 5:35 8:45 9:45 13:45 2012-01-01 AA 1845 BOS MIA 5:40 9:15 10:15 14:15 2012-01-01 AA 1776 DFW PHL 6:10 10:05 11:05 15:05 2012-01-01 AA 2042 DFW FLL 6:35 10:05 11:05 15:05 2012-01-01 AA 392 ORD LGA 7:15 10:20 11:20 15:20 2012-01-01 AA 1137 ORD MSY 7:20 9:35 10:35 14:35 2012-01-01 AA 1762 MIA ATL 14:00 16:00 17:00 21:00

The goal is to count for each flight number (fl_num) how many other flight numbers are able to connect.

I think I should not get too much into detail but mathematically I would explain it as this:

n=1 = current obs; N+k = each following

min_connex_time(n) < crs_dep_time (n+k) < max_connex_time (n)      AND      Dest(n) = Origin (n+k)

In words, the first observation describes a flight from Los Angeles to Denver. This can be connected to any flight in the following observations that fullfills the conditions, i.e. departing from the origin where the current flight arrived at and departing within the given timespan of min and max_arr_time

This should generate a count variable, which displays how many times the condition was met.

The result should look like this:

 FL_DATE CARRIER FL_NUM ORIGIN DEST crs_dep_time crs_arr_time min_connex_time max_connex_time Connex 2012-01-01 AA 2400 LAX DFW 0:20 5:05 6:05 10:05 2 2012-01-01 AA 2043 SFO DFW 0:25 5:35 6:35 10:35 1 2012-01-01 AA 2538 LAS ORD 0:45 6:15 7:15 11:15 1 2012-01-01 AA 2297 JFK MIA 5:35 8:45 9:45 13:45 0 2012-01-01 AA 1845 BOS MIA 5:40 9:15 10:15 14:15 1 2012-01-01 AA 1776 DFW PHL 6:10 10:05 11:05 15:05 0 2012-01-01 AA 2042 DFW FLL 6:35 10:05 11:05 15:05 0 2012-01-01 AA 392 ORD LGA 7:15 10:20 11:20 15:20 0 2012-01-01 AA 1137 ORD MSY 7:20 9:35 10:35 14:35 0 2012-01-01 AA 1762 MIA ATL 14:00 16:00 17:00 21:00 0

Let me know if I need to clarify anything.

Thanks in advance for any hints.

Kind regards,

Sandun

Edit: I edited one flight number. They are usually distinct values, but in some instances of through flight two observations have the same number. But I will deal of that problem later

Accepted Solutions
Solution
‎01-15-2014 02:52 PM
PROC Star
Posts: 8,164

## Re: Conditional Count

This could probably be done in one step, but I did it in two so that I could review the connection flights that were selected and counted:

proc sql;

create table all as

select a.*,b.fl_num as con_flt

from have a

left join have b

on a.dest eq b.origin

and

a.min_connex_time<=b.crs_dep_time<=max_connex_time

order by a.crs_dep_time

;

create table want as

select distinct FL_DATE, CARRIER,FL_NUM,ORIGIN,DEST,

crs_dep_time,crs_arr_time,min_connex_time,

max_connex_time,

case

when missing(con_flt) then 0

else count(*)

end as counter

from all

group by FL_NUM

order by crs_dep_time

;

quit;

All Replies
PROC Star
Posts: 8,164

## Re: Conditional Count

I would think that proc sql will be the easiest way to code a solution to your problem, but I do have some questions regarding your data.

You have 3 different arrival times and I don't think it is clear how they relate to each other.  e.g., flt 2400, as I understand it, might not arrive until 10:05 (max_arr_time).  However you indicated that it has two possible connections, even though no flights leave from DFW after 6:35.

New Contributor
Posts: 3

## Re: Conditional Count

Ok, so min_arr_time and max_arr_time refer to the minimum connection time of +1h and maximum connection time +5h (Maybe I should've just named it that :smileygrin:  I will edit the variable names in the first post) Sorry for the confusion.

So in the case of Fl_num 2400 it arrives at 5:05h in DFW. Therefore it can connect to flights that depart within 6:05h and 10:05h, in this case flights 1776, departing at 6:10h, and 2042, departing at 6:35h.

Flight 2043 can only connect to 2042, since 1776 departs 20 mins earlier than minimum connection time.

Hope this helps!

Solution
‎01-15-2014 02:52 PM
PROC Star
Posts: 8,164

## Re: Conditional Count

This could probably be done in one step, but I did it in two so that I could review the connection flights that were selected and counted:

proc sql;

create table all as

select a.*,b.fl_num as con_flt

from have a

left join have b

on a.dest eq b.origin

and

a.min_connex_time<=b.crs_dep_time<=max_connex_time

order by a.crs_dep_time

;

create table want as

select distinct FL_DATE, CARRIER,FL_NUM,ORIGIN,DEST,

crs_dep_time,crs_arr_time,min_connex_time,

max_connex_time,

case

when missing(con_flt) then 0

else count(*)

end as counter

from all

group by FL_NUM

order by crs_dep_time

;

quit;

New Contributor
Posts: 3

## Re: Conditional Count

You sir are awesome! Why didn't I come up with that solution?

Kudos for using the join in such an interesting way.

Thank you so much. This was my first post, so now you won me over and I'll stick around and see if I can help others as well.

Posts: 3,167

## Re: Conditional Count

I am aware that Art's solution works (like it always does), here is another one using Hash.

data have;

input FL_DATE :yymmdd10. CARRIER :\$2. (FL_NUM ORIGIN DEST) (:\$4.) (crs_dep_time crs_arr_time min_connex_time max_connex_time) (:time5.);

cards;

2012-01-01 AA 2400 LAX DFW 0:20 5:05 6:05 10:05

2012-01-01 AA 2043 SFO DFW 0:25 5:35 6:35 10:35

2012-01-01 AA 2538 LAS ORD 0:45 6:15 7:15 11:15

2012-01-01 AA 2297 JFK MIA 5:35 8:45 9:45 13:45

2012-01-01 AA 1845 BOS MIA 5:40 9:15 10:15 14:15

2012-01-01 AA 1776 DFW PHL 6:10 10:05 11:05 15:05

2012-01-01 AA 2042 DFW FLL 6:35 10:05 11:05 15:05

2012-01-01 AA 392    ORD LGA 7:15 10:20 11:20 15:20

2012-01-01 AA 1137 ORD MSY 7:20 9:35 10:35 14:35

2012-01-01 AA 1762 MIA ATL 14:00 16:00 17:00 21:00

;

data want;

if _n_=1 then do;

declare hash h(dataset: 'have (rename=(crs_dep_time=_dep))', multidata:'y');

h.definekey('fl_date', 'carrier', 'origin');

h.definedata('_dep');

h.definedone(); call missing (_dep);

end;

set have;

rc=h.find(key: fl_date, key: carrier, key: dest);

count=0;

do rc=0 by 0 while (rc=0);

if min_connex_time < _dep < max_connex_time then count+1;

rc=h.find_next(key: fl_date, key: carrier, key: dest);

end;

drop rc _dep;

run;

Haikuo

🔒 This topic is solved and locked.