BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BCNAV
Quartz | Level 8

I have data that looks like:

 

FLT_BIZ_UIDMSG_BIZ_UIDDATA_LINK_INDICOMM_INDEDM_DATALINKGANDER_DATALINKGANDER_VOICE
361848932758785634  000
361848932758845155  000
361848932758903711Y 010
361848932758903960 Y001
361848932758955914Y 010
361848932759004380Y 010
361848942758785634  000
361848942758845155  000
361848942758903711 Y001
361848942758903960 Y001
361848942758955914  000
361848942759004380  000

 

Every flt_biz_uid can have 1 or more messages (msg_biz_uid). I need to select flt_biz_uids that are voice only. Those that are voice only ALWAYS have DATA_LINK_IND missing, ICOMM_IND=Y, EDM_DATALINK=0, GANDER_DATALINK=0, and  GANDER_VOICE=1.

 

So in the example above, flt_biz_uid 36184893 would not be chosen, but 36184894 would be. Would there be something in sql that would get his done?

 

Thx!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Assuming i understand the business logic

 


data have;
input FLT_BIZ_UID	MSG_BIZ_UID	DATA_LINK_IND :$1.	ICOMM_IND  :$1.	EDM_DATALINK	GANDER_DATALINK	GANDER_VOICE;
cards;
36184893	2758785634	.	.	0	0	0
36184893	2758845155	.	.	0	0	0
36184893	2758903711	Y	.	0	1	0
36184893	2758903960	.	Y	0	0	1
36184893	2758955914	Y	.	0	1	0
36184893	2759004380	Y	.	0	1	0
36184894	2758785634	.	.	0	0	0
36184894	2758845155	.	.	0	0	0
36184894	2758903711	.	Y	0	0	1
36184894	2758903960	.	Y	0	0	1
36184894	2758955914	.	.	0	0	0
36184894	2759004380	.	.	0	0	0
;

proc sql;
create table want as
select *
from have
where FLT_BIZ_UID in
(select  FLT_BIZ_UID from have
where DATA_LINK_IND =' ' AND ICOMM_IND = 'Y' AND EDM_DATALINK = 0 AND GANDER_DATALINK = 0 AND GANDER_VOICE = 1)
group by FLT_BIZ_UID
having max(DATA_LINK_IND ne ' ' or sum(EDM_DATALINK,GANDER_DATALINK)>0)=0;
quit;

/*Or if you want just the qualifying subset i.e GANDER_VOICE=1 within FLT_BIZ_UID*/

proc sql;
create table want as
select *
from have
where FLT_BIZ_UID in
(select  FLT_BIZ_UID from have
where DATA_LINK_IND =' ' AND ICOMM_IND = 'Y' AND EDM_DATALINK = 0 AND GANDER_DATALINK = 0 AND GANDER_VOICE = 1)
group by FLT_BIZ_UID
having max(DATA_LINK_IND ne ' ' or sum(EDM_DATALINK,GANDER_DATALINK)>0)=0 and GANDER_VOICE;
quit;

 

View solution in original post

12 REPLIES 12
Krueger
Pyrite | Level 9

Edit: I apparently missed the most important part, you don't want the first BIZID to appear. I'm updating the code will have it shortly...

PROC SQL;
CREATE TABLE Want AS
SELECT *
FROM HAVE 
WHERE DATA_LINK_IND ="" AND ICOMM_IND = 'Y' AND EDM_DATELINK = 0 AND GANDER_DATALINK = 0 AND GANDER_VOICE = 1;
QUIT;

 

 

After reviewing further help me understand why one vs the other is to appear? They both have a mix of data; both apply to the VOICE requirements you set.

 

If you have some requirements you can create a separate dataset outlining the DISTINCT ID's that should be excluded and then use the below to query that other dataset.

 

PROC SQL;
CREATE TABLE Want AS
SELECT *
FROM HAVE 
WHERE DATA_LINK_IND ="" AND ICOMM_IND = 'Y' AND EDM_DATELINK = 0 AND GANDER_DATALINK = 0 AND GANDER_VOICE = 1
AND FLT_BIZ_UID NOT IN (SELECT FLT_BIZ_UID FROM work.FilterList) ;
QUIT;
BCNAV
Quartz | Level 8

@Krueger 

 

This would not work as it would return 36184893. It has to be on on every message within a flt_biz_uid. That is, the only observations that can occur WITHIN a flight_biz_uid are 

DATA_LINK_IND ="" AND ICOMM_IND = 'Y' AND EDM_DATELINK = 0 AND GANDER_DATALINK = 0 AND GANDER_VOICE = 1;

 

you will note that 36184893 has other messages that would preclude its selection

 

 

BCNAV
Quartz | Level 8

To reiterate: 

 

Each flit_biz_uid (a flight) can have multiple messages (msg_biz_uid). Each message is a type of communication. I am only interested in flt_biz_uids (flights) that have ONLY voice communication throughout the ENTIRE flight. This means that the only messages that should exist for a flight are:

 

DATA_LINK_IND missing, ICOMM_IND=Y, EDM_DATALINK=0, GANDER_DATALINK=0, and  GANDER_VOICE=1 for all messages in a flight.

 

So, for the first flight, flt_biz_uid of 36184893, it would not be chosen because there are OTHER messages that include other conditions.

 

Now, flt_biz_uid 36184894 would be chosen as ALL messages for that flight satisfy:

DATA_LINK_IND missing, ICOMM_IND=Y, EDM_DATALINK=0, GANDER_DATALINK=0, and  GANDER_VOICE=1.

 

Thx

Krueger
Pyrite | Level 9

How/why do the first 2 AND last 2 observations not come into play here then?

36184894 2758785634     0 0 0
36184894 2758845155     0 0 0
36184894 2758955914     0 0 0
36184894 2759004380     0 0 0
BCNAV
Quartz | Level 8

you bring up an EXCELLENT point. I had to check with our radar people. So:

 

DATA_LINK_IND=missing, ICOMM_IND=missing, EDM_DATALINK=0, GANDER_DATALINK=0, GANDER_VOICE=0 really means an error occurred for that message. So this complicates everything. So:

 

1. For a given FLIGHT, remove all instances of DATA_LINK_IND=missing, ICOMM_IND=missing, EDM_DATALINK=0, GANDER_DATALINK=0, GANDER_VOICE=0

 

2. Then check to see for the remaining messages if only the following is true and return those flights

 

DATA_LINK_IND missing, ICOMM_IND=Y, EDM_DATALINK=0, GANDER_DATALINK=0, and  GANDER_VOICE=1.

novinosrin
Tourmaline | Level 20

Assuming i understand the business logic

 


data have;
input FLT_BIZ_UID	MSG_BIZ_UID	DATA_LINK_IND :$1.	ICOMM_IND  :$1.	EDM_DATALINK	GANDER_DATALINK	GANDER_VOICE;
cards;
36184893	2758785634	.	.	0	0	0
36184893	2758845155	.	.	0	0	0
36184893	2758903711	Y	.	0	1	0
36184893	2758903960	.	Y	0	0	1
36184893	2758955914	Y	.	0	1	0
36184893	2759004380	Y	.	0	1	0
36184894	2758785634	.	.	0	0	0
36184894	2758845155	.	.	0	0	0
36184894	2758903711	.	Y	0	0	1
36184894	2758903960	.	Y	0	0	1
36184894	2758955914	.	.	0	0	0
36184894	2759004380	.	.	0	0	0
;

proc sql;
create table want as
select *
from have
where FLT_BIZ_UID in
(select  FLT_BIZ_UID from have
where DATA_LINK_IND =' ' AND ICOMM_IND = 'Y' AND EDM_DATALINK = 0 AND GANDER_DATALINK = 0 AND GANDER_VOICE = 1)
group by FLT_BIZ_UID
having max(DATA_LINK_IND ne ' ' or sum(EDM_DATALINK,GANDER_DATALINK)>0)=0;
quit;

/*Or if you want just the qualifying subset i.e GANDER_VOICE=1 within FLT_BIZ_UID*/

proc sql;
create table want as
select *
from have
where FLT_BIZ_UID in
(select  FLT_BIZ_UID from have
where DATA_LINK_IND =' ' AND ICOMM_IND = 'Y' AND EDM_DATALINK = 0 AND GANDER_DATALINK = 0 AND GANDER_VOICE = 1)
group by FLT_BIZ_UID
having max(DATA_LINK_IND ne ' ' or sum(EDM_DATALINK,GANDER_DATALINK)>0)=0 and GANDER_VOICE;
quit;

 

Krueger
Pyrite | Level 9

@BCNAV wrote:

you bring up an EXCELLENT point. I had to check with our radar people. So:

 

DATA_LINK_IND=missing, ICOMM_IND=missing, EDM_DATALINK=0, GANDER_DATALINK=0, GANDER_VOICE=0 really means an error occurred for that message. So this complicates everything. So:

 

1. For a given FLIGHT, remove all instances of DATA_LINK_IND=missing, ICOMM_IND=missing, EDM_DATALINK=0, GANDER_DATALINK=0, GANDER_VOICE=0

 

2. Then check to see for the remaining messages if only the following is true and return those flights

 

DATA_LINK_IND missing, ICOMM_IND=Y, EDM_DATALINK=0, GANDER_DATALINK=0, and  GANDER_VOICE=1.






That makes more sense. So you can delete that data first.

DATA HAVE; SET HAVE; IF DATA_LINK_IND = " " AND ICOMM_IND = " " .... THEN DELETE; RUN;

Then it looks like @novinosrin code is the only one you could copy paste and make no other changes in order to get the correct data.

BCNAV
Quartz | Level 8

Thank you all for your assistance!!

BCNAV
Quartz | Level 8

All this works great....but I am having an issues understanding the having statement. Suppose I want to change this to pull up errors. So for a flight series (whole flight) consists of 

 

DATA_LINK_IND is missing, ICOMM_IND is missing, EDM_DATALINK=0, GANDER_DATALINK=0, and  GANDER_VOICE=0.

 

How would there "where" and "having" statement change?

 

In addition

 

DATA_LINK_IND is not missing or ICOMM_IND is not missing , EDM_DATALINK=0, GANDER_DATALINK=1 or GANDER_VOICE=1

 

How would there "where" and "having" statement change?

 

 

novinosrin
Tourmaline | Level 20

Hi @BCNAV  Let me explain the logic that the solution utilizes based upon assumptions of sorts

 

The following is how the algorithm works for the SQL processor to execute. Of course SQL optimizer would do other combinations in the join but that's beyond the scope of this thread. 

So,

1. The subquery

(select FLT_BIZ_UID from have
where DATA_LINK_IND =' ' AND ICOMM_IND = 'Y' AND EDM_DATALINK = 0 AND GANDER_DATALINK = 0 AND GANDER_VOICE = 1)

will result in a subset of FLT_BIZ_UID  of our interest, in other words the true records.

 

2. The WHERE clause that is part of the outer query i.e where FLT_BIZ_UID in (results of previous) would mean the outer query would result in matching FLT_BIZ_UID 's of interest however with both records that are true and not true. This in essence becomes a super set with which we are gonna play with.

 

3. Now, we group the resulting superset from 1 & 2 using a GROUP BY clause to examine whether a particular group would qualify for our final desired result.

 

4. Since we are aware HAVING clause generally works with filtering groups based on a summary/condition, we begin to utilise this clause at this point. Let's see how

 

5. The data seems to suggest for any valid ID's i.e all ID's in our superset are anyway valid because of our initial screening and filter with a sub-query that we used in the where clause as detailed in 1 & 2.

 

6. The False/untrue records within the superset are apparently the ones with DATA_LINK_IND with "Y" taken as pointing to something different or one of the numeric columns i.e EDM_DATALINK,GANDER_DATALINK would have a NON-ZERO value. This assumption pretty much classifies the UNTRUE/FALSE records within valid ID's

 

7. Since it's a group, only one or some records are likely  to fall into the False category and therefore we need to determine how to Flag that for the entire group to be dropped considering the fact it contains both False and true records. 

 

8. Therefore the condition described in point 6 is basically converted into a Boolean expression resulting in 1's for False records, 0's for true records.

 

9. Point 8 would result in ID's with false and true records to have both 1 and 0's while ID's with no false records to only have 0's in the Boolean expression.

 

10. Now we need to find a way to drop the entire ID group should it possess at least one FALSE record, therefore  computing a MAX of the Boolean(1,0) would help us in determining the same and so if the max is zero for an ID group, we can safely subset that to our final result.-->max(DATA_LINK_IND ne ' ' or sum(EDM_DATALINK,GANDER_DATALINK)>0)=0

 

I hope the above details are clear enough to understand. If you do, I am sure you can play around considering that you know the algorithm behind the SQL syntax. Of course feel free to let us know any change once you think we are in the same page and then we shall jump on to the next. I am more than happy to work interactively meaning going back and forth. 🙂

BCNAV
Quartz | Level 8

Thanks ...helpful!  However, what about the scenario when a flight has BOTH voice and data messages?

 

FLT_BIZ_UIDMSG_BIZ_UIDDATA_LINK_INDICOMM_INDEDM_DATALINKGANDER_DATALINKGANDER_VOICE
361742342758262666 Y001
361742342758262766Y 010
361742342758262822 Y001
361742342758269036Y 010
361742342758299683  000
361742342758299684Y 010
361742342758323649Y 01

0

 

In this case I think the where statement would be:

 

where (ICOMM_IND = 'Y' or DATA_LINK_IND = 'Y') AND EDM_DATALINK = 0 AND (GANDER_DATALINK = 1 OR GANDER_VOICE = 1))

but I am unsure of the having statement (my crack is below):

 

having max((ICOMM_IND is not missing or DATA_LINK_IND is not missing) or sum(EDM_DATALINK,GANDER_DATALINK,GANDER_VOICE)>1) = 0

Thanks again!

 

 

novinosrin
Tourmaline | Level 20

Okay, so if you are trying to change the WHERE clause , that means we really have to make sure in determining what constitutes to being true records.  So basically the "where" shouldn't change as that's what the business rule is. Once that is absolute, anything that's not true is false. Therefore, the diligence applies in establishing the business rule first i.e to be applied in where clause.

 

Once that is done, we would check for ID's with true and false records and drop those or in other words keep only those ID's with true records. 

 

A change in business rule would warrant a change in WHERE clause is basically the principle, which essentially means that becomes a different program

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1155 views
  • 0 likes
  • 3 in conversation