SAS Programming

BCNAV · Posted 02-20-2020 03:21 PM

I have raw flight data that I need to make selections from. Each flight is assigned a unique flt_biz_uid. Each flight can have many messages (msg_biz_uid) based on a flight plan. A message can contain a waypoint (flt_pint_code). A paired down flightplan for one flight is:

flt_biz_uid	msg_biz_uid	flt_pt_code
37532375	2859985831	IGN
37532375	2859985832	BETBE
37532375	2859985833	KSFO
37532375	2859985834	KJFK
37532375	2859985835	ZORUN
37532375	2859985836	GALLI
37532375	2859985837	ORRCA
37532375	2859985838	PRNCE
37532375	2859985839	LORAH
37532375	2859985840	RUBKI
37532375	2859985841	DAYYY
37532375	2859985842	SIKBO
37532375	2859985843	TULEG
37532375	2859985844	RAP
37532375	2859985845	CZI
37532375	2859985846	BOY
37532375	2859985847	MLD

This goes on for millions of flights. I would like to be able to pick out flt_biz_uids that have specific routes (waypoints). If the above was the complete dataset, and if I wanted to pull out flights that included RUBKI, SIKBO, and TULEG, then ALL of the data above would be pulled. Had I wanted, say, SNOOPY1, SNOOPY2, SNOOPY3 from the above dataset example, then nothing would be returned. I am thinking this maybe SQL? Not sure. Thanks.

novinosrin · Posted 02-20-2020 03:43 PM

Hi @BCNAV To my mind, it does seem you are in the right track to think of SQL. Am i understanding the req correctly?

data have;
input (flt_biz_uid	msg_biz_uid	flt_pt_code) (:$20.);
cards;
37532375	2859985831	IGN
37532375	2859985832	BETBE
37532375	2859985833	KSFO
37532375	2859985834	KJFK
37532375	2859985835	ZORUN
37532375	2859985836	GALLI
37532375	2859985837	ORRCA
37532375	2859985838	PRNCE
37532375	2859985839	LORAH
37532375	2859985840	RUBKI
37532375	2859985841	DAYYY
37532375	2859985842	SIKBO
37532375	2859985843	TULEG
37532375	2859985844	RAP
37532375	2859985845	CZI
37532375	2859985846	BOY
37532375	2859985847	MLD
;


%let select_flightpoints='RUBKI', 'SIKBO', 'TULEG';

proc sql;
create table want as
select *
from have
where flt_biz_uid in (select flt_biz_uid from have where flt_pt_code in (&select_flightpoints))
order by flt_biz_uid,msg_biz_uid;
quit;

/*Or*/

Should all select_flightpoints be strictly present then , the below SQL version should suffice


%let select_flightpoints='RUBKI', 'SIKBO', 'TULEG';
proc sql;
create table want as
select *
from have 
where  flt_biz_uid in 
(select flt_biz_uid 
from have 
where flt_pt_code in (&select_flightpoints)
group by flt_biz_uid
having count(distinct flt_pt_code)=countw("&select_flightpoints",','));
quit;

View solution in original post

novinosrin · Posted 02-20-2020 03:43 PM

Hi @BCNAV To my mind, it does seem you are in the right track to think of SQL. Am i understanding the req correctly?

data have;
input (flt_biz_uid	msg_biz_uid	flt_pt_code) (:$20.);
cards;
37532375	2859985831	IGN
37532375	2859985832	BETBE
37532375	2859985833	KSFO
37532375	2859985834	KJFK
37532375	2859985835	ZORUN
37532375	2859985836	GALLI
37532375	2859985837	ORRCA
37532375	2859985838	PRNCE
37532375	2859985839	LORAH
37532375	2859985840	RUBKI
37532375	2859985841	DAYYY
37532375	2859985842	SIKBO
37532375	2859985843	TULEG
37532375	2859985844	RAP
37532375	2859985845	CZI
37532375	2859985846	BOY
37532375	2859985847	MLD
;


%let select_flightpoints='RUBKI', 'SIKBO', 'TULEG';

proc sql;
create table want as
select *
from have
where flt_biz_uid in (select flt_biz_uid from have where flt_pt_code in (&select_flightpoints))
order by flt_biz_uid,msg_biz_uid;
quit;

/*Or*/

Should all select_flightpoints be strictly present then , the below SQL version should suffice


%let select_flightpoints='RUBKI', 'SIKBO', 'TULEG';
proc sql;
create table want as
select *
from have 
where  flt_biz_uid in 
(select flt_biz_uid 
from have 
where flt_pt_code in (&select_flightpoints)
group by flt_biz_uid
having count(distinct flt_pt_code)=countw("&select_flightpoints",','));
quit;

ChrisNZ · Posted 02-20-2020 03:55 PM

If the table is not sorted, something like this should work:

proc sql;
  select a.*
  from TAB a                                    
      ,TAB b                                    
      ,TAB c                                    
      ,TAB d
  where b.FLT_PT_CODE = 'SNOOPY1'
    and c.FLT_PT_CODE = 'SNOOPY2'
    and d.FLT_PT_CODE = 'SNOOPY3'
    and a.FLT_BIZ_UID = b.FLT_BIZ_UID 
    and a.FLT_BIZ_UID = c.FLT_BIZ_UID 
    and a.FLT_BIZ_UID = d.FLT_BIZ_UID ;

If it is sorted, there may be faster methods.

High-Performance SAS Coding - Third Edition

PGStats · Posted 02-20-2020 05:44 PM

If the data is sorted by flt_biz_uid then do:

%let select_flightpoints=RUBKI SIKBO TULEG;

data want;
array sf{*} &select_flightpoints;

do until(last.flt_biz_uid); 
	set have; by flt_biz_uid;
	do i = 1 to dim(sf);
		if flt_pt_code = vname(sf{i}) then sf{i} = 1;
		end;
	end;
do until(last.flt_biz_uid); 
	set have; by flt_biz_uid;
	if sum(of sf{*}) = dim(sf) then output;
	end;
drop i &select_flightpoints;
run;

PG

mkeintz · Posted 02-20-2020 09:30 PM

If your data are sorted by here's a proof-of-concept for the Data step:

data want (drop=nfound);
  set have (where=  flt_pt_cod in ('RUBKI','SIKBO','TULEG') in=foundwaypoints)
      have (in=allwaypoints)
  by FLTBIZ_UID ;

  if first.fltbiz_uid=1 then nfound=0;
  nfound+foundwaypoints;
  if allwaypoints=1 and nfound=3;
run;

Of course, this assumes that no waypoint is recorded more than once per flight.

I assume the SQL solution is more common, but this may well be faster, primarily because it takes advantage of the fact that the data are sorted. So, unlike the SQL solution, it doesn't have to keep somewhere in memory the flight id's of all the flights (both qualifying and not), in case a flight reappears later in the data set.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Programming

Selecting Data (Based on Flight Waypoints)

Re: Selecting Data (Based on Flight Waypoints)

Re: Selecting Data (Based on Flight Waypoints)

Re: Selecting Data (Based on Flight Waypoints)

Re: Selecting Data (Based on Flight Waypoints)

Re: Selecting Data (Based on Flight Waypoints)

Follow Us

What is...

SAS Programming

Selecting Data (Based on Flight Waypoints)

Re: Selecting Data (Based on Flight Waypoints)

Re: Selecting Data (Based on Flight Waypoints)

Re: Selecting Data (Based on Flight Waypoints)

Re: Selecting Data (Based on Flight Waypoints)

Re: Selecting Data (Based on Flight Waypoints)

Register Today!

SAS Training: Just a Click Away

Follow Us

What is...