BookmarkSubscribeRSS Feed
BCNAV
Quartz | Level 8

When aircraft fly in our airspace they are given a unique numeric ID called FLT_BIZ_UID. As messages are sent between the aircraft and ATC, they are stored (items stored include, the message, aircraft type and a bunch of other things). Every time there is a message by that specific flight a new FLT_BIZ_UID record is made (with the same numeric number of course as the flight is continuing on).

 

For example:

 

FLT_BIZ_UID     15 variables across    EDMONTON_DATALINK    GANDER_DATALINK     GANDER_VOICE  

123456                                                   1                                           0                                       0                           

123456                                                   1                                           0                                       0

333333                                                   1                                           0                                       0

333333                                                   1                                           0                                       0

333333                                                   1                                           0                                       1

 

So as you can see, there are multiple records for FLT_BIZ_UID based on the number of times the aircraft messages ATC

 

I would like to be able to generate a subset of this data.

 

I want all FLT_BIZ_UID observations that are all EDMONTON_DATALINK=1 GANDER_DATALINK=0 and GANDER_VOICE=0

 

So above, all 123456 records would be kept, while none of 333333 would be kept because the last observation of the group containing 333333 in the FLT_BIZ_UID has a GANDER_VOICE=1

 

Not sure how to approach this

 

thanks

8 REPLIES 8
SuryaKiran
Meteorite | Level 14

A sub-query might help. Create a query for finding all the FLT_BIZ_UID that are not in the condition your looking for and use that as a sub-query to eliminate those records.

 

Something like this (untested):

 

proc sql;
select * from have
where EDMONTON_DATALINK=1 and GANDER_DATALINK=0 and GANDER_VOICE=0
	and FLT_BIZ_UID not in (select  FLT_BIZ_UID from have where not (EDMONTON_DATALINK=1 and GANDER_DATALINK=0 and GANDER_VOICE=0));
quit;
Thanks,
Suryakiran
novinosrin
Tourmaline | Level 20

hi @BCNAV pretty straightforward with Boolean


data have;
input FLT_BIZ_UID  EDMONTON_DATALINK    GANDER_DATALINK     GANDER_VOICE ;
cards;
123456                                                   1                                           0                                       0                           
123456                                                   1                                           0                                       0
333333                                                   1                                           0                                       0
333333                                                   1                                           0                                       0
333333                                                   1                                           0                                       1
;

proc sql;
create table want as
select *
from have
group by FLT_BIZ_UID
having sum(EDMONTON_DATALINK=1 and GANDER_DATALINK=0 and GANDER_VOICE=0)=n(FLT_BIZ_UID);
quit;

 

 

novinosrin
Tourmaline | Level 20

The above can also be achieved with a Double DOW or an imposed interleave sorted by FLT_BIZ_UID too but lazy people like me who can't be bothered to type would prefer the above

mkeintz
PROC Star

Double DOW can appear busy, but here's a relatively brief equivalent:

 

data want (drop=_:);
  set have (in=in1) have (in=in2);
  by flt_biz_uid;
  if first.flt_biz_uid then call missing(_n1,_nqualify);
  _n1+in1;
  _nqualify+(in1=1 and EDMONTON_DATALINK=1 and GANDER_DATALINK=0 and GANDER_VOICE=0);
  if in2 and _n1=_nqualify;
run;

 

Unlike the proc sql solution, this requires data to be sorted by flt_biz_uid, but that tradeoff is that this would be faster for large datasets.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

" tradeoff is that this would be faster for large datasets." -- Fully agree 🙂 .

 

EDIT to make a point representing my classmates who are (beginners to  just intermediates):

Their concern:

"Hey Dude, Since you are regular in communities, can you let the community  know our concern i.e

-- Sophisticated  codes are lovely and we like it but makes us  go back to community over and over the very moment we run into a hiccup"--Our 2 cents mate!  DePaul CDM computing lab 

 

I guess yours and DOW etc would fall into high sophistication.  🙂

mkeintz
PROC Star

I would suggest the classmates, or any subscriber, treat Sas communities like any crowd-sourced, free resource.  It's a buffet - take only those offerings that are most suitable to your needs.  However, over my own subscription tenure, I've often only wanted a "non-sophisticated" solution to a problem, but when I occasionally had time, I was happy to see other solutions that stretched my understanding.

 

And Yo DePaul CDM computing lab :  There is no better status than student to justify putting forth a need for alternatives or more thorough explanations.   As far as I can tell - all sas-related comments are welcome on this forum.  So you don't need cover from @novinosrin - nobody will bite.  Probably this note is the harshest you'd ever get.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
DBailey
Lapis Lazuli | Level 10

similar to below but I read your requirements a bit differently:

 

proc sql;
select * from have
where FLT_BIZ_UID not in 
(select FLT_BIZ_UID from have where EDMONTON_DATALINK=0 or GANDER_DATALINK=1 or GANDER_VOICE=1); quit;
s_lassen
Meteorite | Level 14

I would do it like this:

data want;
  merge have(where=(not(Edmonton_datalink=1 and Gander_datalink=0 and Gander_voice=0)) in=wrong)
  have
  ;
  by FLT_BIZ_UID;
  if not wrong;
run;

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1818 views
  • 1 like
  • 6 in conversation