Re: Select Data with Same ID Based on Conditions

BCNAV · Posted 10-24-2018 11:50 AM

When aircraft fly in our airspace they are given a unique numeric ID called FLT_BIZ_UID. As messages are sent between the aircraft and ATC, they are stored (items stored include, the message, aircraft type and a bunch of other things). Every time there is a message by that specific flight a new FLT_BIZ_UID record is made (with the same numeric number of course as the flight is continuing on).

For example:

FLT_BIZ_UID 15 variables across EDMONTON_DATALINK GANDER_DATALINK GANDER_VOICE

123456 1 0 0

333333 1 0 0

333333 1 0 1

So as you can see, there are multiple records for FLT_BIZ_UID based on the number of times the aircraft messages ATC

I would like to be able to generate a subset of this data.

I want all FLT_BIZ_UID observations that are all EDMONTON_DATALINK=1 GANDER_DATALINK=0 and GANDER_VOICE=0

So above, all 123456 records would be kept, while none of 333333 would be kept because the last observation of the group containing 333333 in the FLT_BIZ_UID has a GANDER_VOICE=1

Not sure how to approach this

thanks

SuryaKiran · Posted 10-24-2018 12:09 PM

A sub-query might help. Create a query for finding all the FLT_BIZ_UID that are not in the condition your looking for and use that as a sub-query to eliminate those records.

Something like this (untested):

proc sql;
select * from have
where EDMONTON_DATALINK=1 and GANDER_DATALINK=0 and GANDER_VOICE=0
	and FLT_BIZ_UID not in (select  FLT_BIZ_UID from have where not (EDMONTON_DATALINK=1 and GANDER_DATALINK=0 and GANDER_VOICE=0));
quit;

Thanks,
Suryakiran

novinosrin · Posted 10-24-2018 12:21 PM

hi @BCNAV pretty straightforward with Boolean


data have;
input FLT_BIZ_UID  EDMONTON_DATALINK    GANDER_DATALINK     GANDER_VOICE ;
cards;
123456                                                   1                                           0                                       0                           
123456                                                   1                                           0                                       0
333333                                                   1                                           0                                       0
333333                                                   1                                           0                                       0
333333                                                   1                                           0                                       1
;

proc sql;
create table want as
select *
from have
group by FLT_BIZ_UID
having sum(EDMONTON_DATALINK=1 and GANDER_DATALINK=0 and GANDER_VOICE=0)=n(FLT_BIZ_UID);
quit;

novinosrin · Posted 10-24-2018 12:28 PM

The above can also be achieved with a Double DOW or an imposed interleave sorted by FLT_BIZ_UID too but lazy people like me who can't be bothered to type would prefer the above

mkeintz · Posted 10-24-2018 12:43 PM

Double DOW can appear busy, but here's a relatively brief equivalent:

data want (drop=_:);
  set have (in=in1) have (in=in2);
  by flt_biz_uid;
  if first.flt_biz_uid then call missing(_n1,_nqualify);
  _n1+in1;
  _nqualify+(in1=1 and EDMONTON_DATALINK=1 and GANDER_DATALINK=0 and GANDER_VOICE=0);
  if in2 and _n1=_nqualify;
run;

Unlike the proc sql solution, this requires data to be sorted by flt_biz_uid, but that tradeoff is that this would be faster for large datasets.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

novinosrin · Posted 10-24-2018 12:44 PM

" tradeoff is that this would be faster for large datasets." -- Fully agree 🙂 .

EDIT to make a point representing my classmates who are (beginners to just intermediates):

Their concern:

"Hey Dude, Since you are regular in communities, can you let the community know our concern i.e

-- Sophisticated codes are lovely and we like it but makes us go back to community over and over the very moment we run into a hiccup"--Our 2 cents mate! DePaul CDM computing lab

I guess yours and DOW etc would fall into high sophistication. 🙂

mkeintz · Posted 10-24-2018 04:46 PM

I would suggest the classmates, or any subscriber, treat Sas communities like any crowd-sourced, free resource. It's a buffet - take only those offerings that are most suitable to your needs. However, over my own subscription tenure, I've often only wanted a "non-sophisticated" solution to a problem, but when I occasionally had time, I was happy to see other solutions that stretched my understanding.

And Yo DePaul CDM computing lab : There is no better status than student to justify putting forth a need for alternatives or more thorough explanations. As far as I can tell - all sas-related comments are welcome on this forum. So you don't need cover from @novinosrin - nobody will bite. Probably this note is the harshest you'd ever get.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

DBailey · Posted 10-24-2018 01:00 PM

similar to below but I read your requirements a bit differently:

proc sql;
select * from have
where FLT_BIZ_UID not in 
(select  FLT_BIZ_UID from have where EDMONTON_DATALINK=0 or GANDER_DATALINK=1 or GANDER_VOICE=1);
quit;

s_lassen · Posted 10-26-2018 05:23 AM

I would do it like this:

data want;
  merge have(where=(not(Edmonton_datalink=1 and Gander_datalink=0 and Gander_voice=0)) in=wrong)
  have
  ;
  by FLT_BIZ_UID;
  if not wrong;
run;

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away