I have raw flight data that I need to make selections from. Each flight is assigned a unique flt_biz_uid. Each flight can have many messages (msg_biz_uid) based on a flight plan. A message can contain a waypoint (flt_pint_code). A paired down flightplan for one flight is:
flt_biz_uid | msg_biz_uid | flt_pt_code |
37532375 | 2859985831 | IGN |
37532375 | 2859985832 | BETBE |
37532375 | 2859985833 | KSFO |
37532375 | 2859985834 | KJFK |
37532375 | 2859985835 | ZORUN |
37532375 | 2859985836 | GALLI |
37532375 | 2859985837 | ORRCA |
37532375 | 2859985838 | PRNCE |
37532375 | 2859985839 | LORAH |
37532375 | 2859985840 | RUBKI |
37532375 | 2859985841 | DAYYY |
37532375 | 2859985842 | SIKBO |
37532375 | 2859985843 | TULEG |
37532375 | 2859985844 | RAP |
37532375 | 2859985845 | CZI |
37532375 | 2859985846 | BOY |
37532375 | 2859985847 | MLD |
This goes on for millions of flights. I would like to be able to pick out flt_biz_uids that have specific routes (waypoints). If the above was the complete dataset, and if I wanted to pull out flights that included RUBKI, SIKBO, and TULEG, then ALL of the data above would be pulled. Had I wanted, say, SNOOPY1, SNOOPY2, SNOOPY3 from the above dataset example, then nothing would be returned. I am thinking this maybe SQL? Not sure. Thanks.
Hi @BCNAV To my mind, it does seem you are in the right track to think of SQL. Am i understanding the req correctly?
data have;
input (flt_biz_uid msg_biz_uid flt_pt_code) (:$20.);
cards;
37532375 2859985831 IGN
37532375 2859985832 BETBE
37532375 2859985833 KSFO
37532375 2859985834 KJFK
37532375 2859985835 ZORUN
37532375 2859985836 GALLI
37532375 2859985837 ORRCA
37532375 2859985838 PRNCE
37532375 2859985839 LORAH
37532375 2859985840 RUBKI
37532375 2859985841 DAYYY
37532375 2859985842 SIKBO
37532375 2859985843 TULEG
37532375 2859985844 RAP
37532375 2859985845 CZI
37532375 2859985846 BOY
37532375 2859985847 MLD
;
%let select_flightpoints='RUBKI', 'SIKBO', 'TULEG';
proc sql;
create table want as
select *
from have
where flt_biz_uid in (select flt_biz_uid from have where flt_pt_code in (&select_flightpoints))
order by flt_biz_uid,msg_biz_uid;
quit;
/*Or*/
Should all select_flightpoints be strictly present then , the below SQL version should suffice
%let select_flightpoints='RUBKI', 'SIKBO', 'TULEG';
proc sql;
create table want as
select *
from have
where flt_biz_uid in
(select flt_biz_uid
from have
where flt_pt_code in (&select_flightpoints)
group by flt_biz_uid
having count(distinct flt_pt_code)=countw("&select_flightpoints",','));
quit;
Hi @BCNAV To my mind, it does seem you are in the right track to think of SQL. Am i understanding the req correctly?
data have;
input (flt_biz_uid msg_biz_uid flt_pt_code) (:$20.);
cards;
37532375 2859985831 IGN
37532375 2859985832 BETBE
37532375 2859985833 KSFO
37532375 2859985834 KJFK
37532375 2859985835 ZORUN
37532375 2859985836 GALLI
37532375 2859985837 ORRCA
37532375 2859985838 PRNCE
37532375 2859985839 LORAH
37532375 2859985840 RUBKI
37532375 2859985841 DAYYY
37532375 2859985842 SIKBO
37532375 2859985843 TULEG
37532375 2859985844 RAP
37532375 2859985845 CZI
37532375 2859985846 BOY
37532375 2859985847 MLD
;
%let select_flightpoints='RUBKI', 'SIKBO', 'TULEG';
proc sql;
create table want as
select *
from have
where flt_biz_uid in (select flt_biz_uid from have where flt_pt_code in (&select_flightpoints))
order by flt_biz_uid,msg_biz_uid;
quit;
/*Or*/
Should all select_flightpoints be strictly present then , the below SQL version should suffice
%let select_flightpoints='RUBKI', 'SIKBO', 'TULEG';
proc sql;
create table want as
select *
from have
where flt_biz_uid in
(select flt_biz_uid
from have
where flt_pt_code in (&select_flightpoints)
group by flt_biz_uid
having count(distinct flt_pt_code)=countw("&select_flightpoints",','));
quit;
If the table is not sorted, something like this should work:
proc sql;
select a.*
from TAB a
,TAB b
,TAB c
,TAB d
where b.FLT_PT_CODE = 'SNOOPY1'
and c.FLT_PT_CODE = 'SNOOPY2'
and d.FLT_PT_CODE = 'SNOOPY3'
and a.FLT_BIZ_UID = b.FLT_BIZ_UID
and a.FLT_BIZ_UID = c.FLT_BIZ_UID
and a.FLT_BIZ_UID = d.FLT_BIZ_UID ;
If it is sorted, there may be faster methods.
If the data is sorted by flt_biz_uid then do:
%let select_flightpoints=RUBKI SIKBO TULEG;
data want;
array sf{*} &select_flightpoints;
do until(last.flt_biz_uid);
set have; by flt_biz_uid;
do i = 1 to dim(sf);
if flt_pt_code = vname(sf{i}) then sf{i} = 1;
end;
end;
do until(last.flt_biz_uid);
set have; by flt_biz_uid;
if sum(of sf{*}) = dim(sf) then output;
end;
drop i &select_flightpoints;
run;
If your data are sorted by here's a proof-of-concept for the Data step:
data want (drop=nfound);
set have (where= flt_pt_cod in ('RUBKI','SIKBO','TULEG') in=foundwaypoints)
have (in=allwaypoints)
by FLTBIZ_UID ;
if first.fltbiz_uid=1 then nfound=0;
nfound+foundwaypoints;
if allwaypoints=1 and nfound=3;
run;
Of course, this assumes that no waypoint is recorded more than once per flight.
I assume the SQL solution is more common, but this may well be faster, primarily because it takes advantage of the fact that the data are sorted. So, unlike the SQL solution, it doesn't have to keep somewhere in memory the flight id's of all the flights (both qualifying and not), in case a flight reappears later in the data set.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.