SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BCNAV
Quartz | Level 8

I have raw flight data that I need to make selections from. Each flight is assigned a unique flt_biz_uid.  Each flight can have many messages (msg_biz_uid) based on a flight plan. A message can contain a waypoint (flt_pint_code). A paired down flightplan for one flight is:

 

flt_biz_uidmsg_biz_uidflt_pt_code
375323752859985831IGN
375323752859985832BETBE
375323752859985833KSFO
375323752859985834KJFK
375323752859985835ZORUN
375323752859985836GALLI
375323752859985837ORRCA
375323752859985838PRNCE
375323752859985839LORAH
375323752859985840RUBKI
375323752859985841DAYYY
375323752859985842SIKBO
375323752859985843TULEG
375323752859985844RAP
375323752859985845CZI
375323752859985846BOY
375323752859985847MLD

 

This goes on for millions of flights.  I would like to be able to pick out flt_biz_uids that have specific routes (waypoints). If the above was the complete dataset, and if I wanted to pull out flights that included RUBKI, SIKBO, and TULEG, then ALL of the data above would be pulled. Had I wanted, say, SNOOPY1, SNOOPY2, SNOOPY3 from the above dataset example, then nothing would be returned. I am thinking this maybe SQL?  Not sure.  Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @BCNAV  To my mind, it does seem you are in the right track to think of SQL. Am i understanding the req correctly?

 

data have;
input (flt_biz_uid	msg_biz_uid	flt_pt_code) (:$20.);
cards;
37532375	2859985831	IGN
37532375	2859985832	BETBE
37532375	2859985833	KSFO
37532375	2859985834	KJFK
37532375	2859985835	ZORUN
37532375	2859985836	GALLI
37532375	2859985837	ORRCA
37532375	2859985838	PRNCE
37532375	2859985839	LORAH
37532375	2859985840	RUBKI
37532375	2859985841	DAYYY
37532375	2859985842	SIKBO
37532375	2859985843	TULEG
37532375	2859985844	RAP
37532375	2859985845	CZI
37532375	2859985846	BOY
37532375	2859985847	MLD
;


%let select_flightpoints='RUBKI', 'SIKBO', 'TULEG';

proc sql;
create table want as
select *
from have
where flt_biz_uid in (select flt_biz_uid from have where flt_pt_code in (&select_flightpoints))
order by flt_biz_uid,msg_biz_uid;
quit;

/*Or*/

Should all  select_flightpoints be strictly present then , the below SQL version should suffice

 


%let select_flightpoints='RUBKI', 'SIKBO', 'TULEG';
proc sql;
create table want as
select *
from have 
where  flt_biz_uid in 
(select flt_biz_uid 
from have 
where flt_pt_code in (&select_flightpoints)
group by flt_biz_uid
having count(distinct flt_pt_code)=countw("&select_flightpoints",','));
quit;

View solution in original post

4 REPLIES 4
novinosrin
Tourmaline | Level 20

Hi @BCNAV  To my mind, it does seem you are in the right track to think of SQL. Am i understanding the req correctly?

 

data have;
input (flt_biz_uid	msg_biz_uid	flt_pt_code) (:$20.);
cards;
37532375	2859985831	IGN
37532375	2859985832	BETBE
37532375	2859985833	KSFO
37532375	2859985834	KJFK
37532375	2859985835	ZORUN
37532375	2859985836	GALLI
37532375	2859985837	ORRCA
37532375	2859985838	PRNCE
37532375	2859985839	LORAH
37532375	2859985840	RUBKI
37532375	2859985841	DAYYY
37532375	2859985842	SIKBO
37532375	2859985843	TULEG
37532375	2859985844	RAP
37532375	2859985845	CZI
37532375	2859985846	BOY
37532375	2859985847	MLD
;


%let select_flightpoints='RUBKI', 'SIKBO', 'TULEG';

proc sql;
create table want as
select *
from have
where flt_biz_uid in (select flt_biz_uid from have where flt_pt_code in (&select_flightpoints))
order by flt_biz_uid,msg_biz_uid;
quit;

/*Or*/

Should all  select_flightpoints be strictly present then , the below SQL version should suffice

 


%let select_flightpoints='RUBKI', 'SIKBO', 'TULEG';
proc sql;
create table want as
select *
from have 
where  flt_biz_uid in 
(select flt_biz_uid 
from have 
where flt_pt_code in (&select_flightpoints)
group by flt_biz_uid
having count(distinct flt_pt_code)=countw("&select_flightpoints",','));
quit;
ChrisNZ
Tourmaline | Level 20

If the table is not sorted, something like this should work:

 

proc sql;
  select a.*
  from TAB a                                    
      ,TAB b                                    
      ,TAB c                                    
      ,TAB d
  where b.FLT_PT_CODE = 'SNOOPY1'
    and c.FLT_PT_CODE = 'SNOOPY2'
    and d.FLT_PT_CODE = 'SNOOPY3'
    and a.FLT_BIZ_UID = b.FLT_BIZ_UID 
    and a.FLT_BIZ_UID = c.FLT_BIZ_UID 
    and a.FLT_BIZ_UID = d.FLT_BIZ_UID ;
  

If it is sorted, there may be faster methods.

 

 

PGStats
Opal | Level 21

If the data is sorted by flt_biz_uid then do:

 

%let select_flightpoints=RUBKI SIKBO TULEG;

data want;
array sf{*} &select_flightpoints;

do until(last.flt_biz_uid); 
	set have; by flt_biz_uid;
	do i = 1 to dim(sf);
		if flt_pt_code = vname(sf{i}) then sf{i} = 1;
		end;
	end;
do until(last.flt_biz_uid); 
	set have; by flt_biz_uid;
	if sum(of sf{*}) = dim(sf) then output;
	end;
drop i &select_flightpoints;
run;
PG
mkeintz
PROC Star

If your data are sorted by here's a proof-of-concept for the Data step:

 

data want (drop=nfound);
  set have (where=  flt_pt_cod in ('RUBKI','SIKBO','TULEG') in=foundwaypoints)
      have (in=allwaypoints)
  by FLTBIZ_UID ;

  if first.fltbiz_uid=1 then nfound=0;
  nfound+foundwaypoints;
  if allwaypoints=1 and nfound=3;
run;

 

Of course, this assumes that no waypoint is recorded more than once per flight.

 

I assume the SQL solution is more common, but this may well be faster, primarily because it takes advantage of the fact that the data are sorted.   So, unlike the SQL solution, it doesn't have to keep somewhere in memory the flight id's of all the flights (both qualifying and not), in case a flight reappears later in the data set.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 792 views
  • 0 likes
  • 5 in conversation