The coronavirus pandemic has affected just about every industry in the world. Some, like manufacturers of medical equipment and video conference app makers, will have found business booming but for the vast majority Covid-19 has been a disaster. Among all these industries though one stands out as having completely stopped - international tourism. Countries all around the world have closed their borders and seem unlikely to reopen them anytime soon.
In this edition of Free Data Friday we will be looking at international tourism data from the United Nations World Tourism Organization (UNWTO) to see which countries receive the most tourists and look for trends over time.
The data can be downloaded as an excel workbook from the UNWTO eLibrary. The workbook has multiple tabs with different breakdowns of the tourist figures but we will simply use the first tab which gives total figures.
There are a number of issues which needed to be addressed before the file could be imported into SAS. As is often the case with Excel files there were a number of header rows which needed to be deleted and a hidden row after the data header which also needed to be made visible and then deleted. I opened the file in Excel to do this and then saved the result as a CSV file to make the import easier. Having done that I used Proc Import to get the file into SAS.
filename reffile '/folders/myshortcuts/Dropbox/International Tourist Arrivals.csv';
proc import datafile=reffile
dbms=csv
out=import
replace;
getnames=yes;
guessingrows=2000;
run;
This is what the first few rows and columns of the imported file look like
The problem here is that the data is "wide" i.e. there is a variable for each year, whereas SAS is easier to work with when the data is "long", with a single variable holding the year value and a variable for the corresponding data. This arrangement of data in a "wide" format is another regular feature of spreadsheets which are easier to work with when the data is arranged in this way. We can convert the data from "wide" to "long" with Proc Transpose like so
proc sort data=import out=import_sorted;
by country;
run;
proc transpose data=import_sorted out=tourists;
by country;
var _1995-_2018;
run;
We now have a file looking like this
This file still needs some tidying up - col1 which holds the tourist data is character and missing values are represented as .. instead of . and the year values have an underscore prefix which needs to be removed. This was achieved with a simple data step
data tourists_fixed(rename=(_name_=year) drop=col1);
length num 8;
set tourists;
_name_=substr(_name_,2,4);
if col1 = ".." then col1=.;
num=input(col1,comma10.);
run;
I decided to look at the top ten countries by tourist numbers for each year. In order to do that I used Proc Rank to get the ranking for each year and Proc SQL to extract the top ten countries into a new file
proc sort data=tourists_fixed(where=(num ne .)) out=tourists_sorted;
by year;
run;
proc rank data=tourists_sorted out=ranked descending;
by year;
var num;
ranks ranking;
run;
proc sql;
create table final
as select *
from ranked
where ranking <= 10
order by year, ranking;
quit;
There is a lot of data to digest here with ten rows for each year from 1995 to 2018. In order to display this I decided to use Proc SGPanel to create a horizontal bar chart for each year.
proc sgpanel data=final;
title1 "Total Inbound Tourists";
title2 "Top Ten Countries (values in 000s)";
footnote1 "Source - United Nations";
footnote2 "Data not available for all countries in all years";
panelby year / novarname columns=1;
hbar country / response=num nozerobars categoryorder=respdesc;
rowaxis label="Country";
colaxis label="Number of Tourists in thousands";
run;
Here is the output for the first two years in the series followed by the last two years
The first thing to note is that we don't have data for all countries and all years e.g. we don't have data for France for 1995. I also found a few surprising features of the data
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Visit [[this link]] to see all the Free Data Friday articles.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.