BookmarkSubscribeRSS Feed

Examining International Tourism Data with SAS

Started ‎05-01-2020 by
Modified ‎08-03-2021 by
Views 4,056
Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
 

 

The coronavirus pandemic has affected just about every industry in the world. Some, like woman-walking-on-pathway-while-strolling-luggage-1008155.jpgmanufacturers of medical equipment and video conference app makers, will have found business booming but for the vast majority Covid-19 has been a disaster. Among all these industries though one stands out as having completely stopped - international tourism. Countries all around the world have closed their borders and seem unlikely to reopen them anytime soon.

 

In this edition of Free Data Friday we will be looking at international tourism data from the United Nations World Tourism Organization (UNWTO) to see which countries receive the most tourists and look for trends over time.

 

Get the Data

 

The data can be downloaded as an excel workbook from the UNWTO eLibrary. The workbook has multiple tabs with different breakdowns of the tourist figures but we will simply use the first tab which gives total figures.

 

Get started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:

Get Started

 

Getting the Data Ready

 

There are a number of issues which needed to be addressed before the file could be imported into SAS. As is often the case with Excel files there were a number of header rows which needed to be deleted and a hidden row after the data header which also needed to be made visible and then deleted. I opened the file in Excel to do this and then saved the result as a CSV file to make the import easier. Having done that I used Proc Import to get the file into SAS.

 

filename reffile '/folders/myshortcuts/Dropbox/International Tourist Arrivals.csv';

proc import datafile=reffile
	dbms=csv
	out=import
	replace;
	getnames=yes;
	guessingrows=2000;
run;

This is what the first few rows and columns of the imported file look like

 

DS1.png

 

The problem here is that the data is "wide" i.e. there is a variable for each year, whereas SAS is easier to work with when the data is "long", with a single variable holding the year value and a variable for the corresponding data. This arrangement of data in a "wide" format is another regular feature of spreadsheets which are easier to work with when the data is arranged in this way. We can convert the data from "wide" to "long" with Proc Transpose like so

 

proc sort data=import out=import_sorted;
	by country;
run;

proc transpose data=import_sorted out=tourists;
	by country;
	var _1995-_2018;
run;

 

We now have a file looking like this

 

DS2.png

 

This file still needs some tidying up - col1 which holds the tourist data is character and missing values are represented as .. instead of . and the year values have an underscore prefix which needs to be removed. This was achieved with a simple data step

 

data tourists_fixed(rename=(_name_=year) drop=col1);
	length num 8;
	set tourists;
	_name_=substr(_name_,2,4);
	if col1 = ".." then col1=.;
	num=input(col1,comma10.);
run;

 

DS3.png

 

The Results

 

I decided to look at the top ten countries by tourist numbers for each year. In order to do that I used Proc Rank to get the ranking for each year and Proc SQL to extract the top ten countries into a new file

 

proc sort data=tourists_fixed(where=(num ne .)) out=tourists_sorted;
	by year;
run;

proc rank data=tourists_sorted out=ranked descending;
	by year;
	var num;
	ranks ranking;
run;

proc sql;
	create table final
	as select *
	from ranked
	where ranking <= 10
	order by year, ranking;
quit;

DS4.png

 

There is a lot of data to digest here with ten rows for each year from 1995 to 2018. In order to display this I decided to use Proc SGPanel to create a horizontal bar chart for each year.

 

proc sgpanel data=final;
	title1 "Total Inbound Tourists";
	title2 "Top Ten Countries (values in 000s)";
	footnote1 "Source - United Nations";
	footnote2 "Data not available for all countries in all years";
	panelby year / novarname columns=1;
	hbar country / response=num nozerobars categoryorder=respdesc;
	rowaxis label="Country";
	colaxis label="Number of Tourists in thousands";
run;

 

Here is the output for the first two years in the series followed by the last two years

 

RES1.png

 

RES2.png

 

The first thing to  note is that we don't have data for all countries and all years e.g. we don't have data for France for 1995. I also found a few surprising features of the data

 

  1. In the early years of the series Mexico and Poland frequently feature at or near the top of the rankings. Presumably Mexico receives a lot of tourists from the USA but Poland was a surprise - it's not one of the Mediterranean countries which are normally thought of as popular European holiday destinations;
  2. France is consistently the number one destination for tourists whenever we have data for it;
  3. China has shot up the rankings;
  4. Croatia has entered the top ten list in recent years, supplanting Slovenia as the former Yugoslavia destination of choice.

 

Now it's Your Turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

 

Visit [[this link]] to see all the Free Data Friday articles.

Version history
Last update:
‎08-03-2021 09:47 AM
Updated by:

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags