I have a program that provides output as follows:
Year Month depart dest Count
2017 8 CYVR CYYZ 578
2017 8 CYOW CYYZ 578
These are airport departure and destination codes. I can select which month/year combo I want and all is good.
Here is the question, how can I compare two months (say August 2016 and August 2016) and have ONLY CHANGED CITY PAIRS come out.
I'm thinking I will need to concatenate depart and dest to, say, departdest (e.g., CYVRCYYZ from above) and use that variable to compare. Essentially I am only interesting in saying:
"When comparing August 2017 to August 2016 the following city pairs were added. When comparing August 2017 to August 2016 the following city pairs were removed."
Any initial ideas would be great
thx
Here's a solution that doesn't scale well to multiple data sets IMO. You need to manually code it and customize for each month.
And an SQL option that does scale, using EXCEPT
You may be able to get away with:
proc sql; create table WANT as select * from (select distinct YEAR,MONTH,DEPART,DEST from HAVE) where count(*) > 1; quit;
Not not tested, post test data in the form of a datastep.
Here is more info
I have two tables:
AUGUST_2017
AUGUST_2016
Each table contains airport city pair data (I can post data if needed). Data looks like:
AUGUST_2017:
Year Month depart dest Count
2017 8 CYVR CYYZ 578
2017 8 CYYZ CYVR 577
2017 8 CYOW CYYZ 530
2017 8 CYYZ CYOW 529
AUGUST_2016:
Year Month depart dest Count
2016 8 CYVR CYYZ 578
2016 8 CYYZ CYVR 577
2016 8 CYOW CYYZ 530
2016 8 CYYZ CYOW 529
and so on for thousands of flights.
I need to be able to query both existing tables.
The first query would be to check what is in AUGUST_2017 that is not in AUGUST_2016. These would be added flights.
The second query would be to check what is in AUGUST_2016 that is not in AUGUST_2017. These would be flights removed by the airline.
I don't think my original post was clear enough. Sorry.
I am not sure of the code..can you post the sql
PROC SQL;
CREATE TABLE EGTASK.TEST2 AS
SELECT t1.depart,
t1.dest
intersect EGTASK.AUGUST_2017 t1, EGTASK.AUGUST_2016 t2;
QUIT;
1. Your sample data all are the same so there are no differences to be found
2. You didn't put them as a data step so I'd have to write code to import your data first
3. To demonstrate an actual example I'd have to add new data that would show the differences
4. The code below is INTERSECT which would show the same in both data sets. EXCEPT will show the exceptions.
So even if the code worked, there's no way to test it at present with your sample data.
Here's a reference for the SQL operators with examples
http://www2.sas.com/proceedings/sugi31/242-31.pdf
Here's a solution that doesn't scale well to multiple data sets IMO. You need to manually code it and customize for each month.
And an SQL option that does scale, using EXCEPT
Thank you!!!
I have tested and your solution is perfect. I ended up with:
/* The code below will get all flights that were removed from the previous year */
PROC SQL;
CREATE TABLE EGTASK.ACM_FLT_REMOVED AS
SELECT depart, dest
FROM EGTASK.CITY_PR_TOTAL_PYEAR
EXCEPT
SELECT depart, dest from EGTASK.CITY_PR_TOTAL_CYEAR;
QUIT;
/* The code below will get all flights that were added in the current year */
PROC SQL;
CREATE TABLE EGTASK.ACM_FLT_ADDED AS
SELECT depart, dest
FROM EGTASK.CITY_PR_TOTAL_CYEAR
EXCEPT
SELECT depart, dest from EGTASK.CITY_PR_TOTAL_PYEAR;
QUIT;
As you can see I have generalized it to current year (CYEAR) and previous year (PYEAR).
Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.