This is the time of year when people often start to look forward and plan for next year's vacation. It helps to lift spirits during the cold, dark winter months. However, people’s idea of what makes a great vacation varies a lot. Some people like to spend their time sunning themselves on a beach while others like to learn about the history and culture of the places they visit. In this article we will be using SAS to analyze data from the Mapping Museums Project to help people who fall into that latter category and are visiting the UK to plan their vacation.
The data can be downloaded from the project web site here in several different formats. I initially downloaded the data as a CSV file but as I had a couple of minor issues importing it I converted the file to XLSX format and uploaded it to SAS OnDemand.
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:
Firstly I used the XLSX libname engine to read the file as a SAS data set.
libname xl xlsx "/home/chris52brooks/Museums/MappingMuseumsData2021_09_30.xlsx";
The file contains a large number of variables and records for places which are no longer open. In order to to make it a little easier to work with I used the following code to remove those closed places (open venues have a value in the year_closed variable of “9999”) and to only keep variables which might be interesting.
data museums_abbr(drop=year_closed);
set xl.MAPPINGMUSEUMSDATA2021_09_30(where=(substr(year_closed,1,4) eq "9999") keep=
name_of_museum
size
subject_matter
admin_area
postcode
latitude
longitude
year_closed);
run;
The first thing I wanted to do was find out which were the predominant type of museums in the UK. To do this all I need is a simple PROC FREQ followed by a PROC SORT.
proc freq data=museums_abbr;
table subject_matter / out=freq_subject;
run;
proc sort data=freq_subject out=subject_sorted;
by descending count;
run;
I then used PROC SGPLOT to generate a bar chart showing me the top 10 museum categories.
ods graphics / reset;
proc sgplot data=subject_sorted(obs=10);
title1 "Top 10 Museum Subjects in the UK";
footnote j=r "Data From: The Mapping Museums Project";
hbar subject_matter / response=count
datalabel datalabelattrs=(weight=bold) categoryorder=respdesc ;
xaxis grid label="Total Museums";
yaxis grid label="Museum Subject";
run;
This gives me the answer but the labels on the vertical axis look ugly, with hyphens and underscores where you would expect spaces. I therefore ran a Data step to replace those characters with spaces using the translate function and regenerated my chart.
data subject_sorted2;
set subject_sorted;
subject_matter=translate(subject_matter," ","-_");
run;
ods graphics / reset;
proc sgplot data=subject_sorted2(obs=10);
title1 "Top 10 Museum Subjects in the UK";
footnote j=r "Data From: The Mapping Museums Project";
hbar subject_matter / response=count
datalabel datalabelattrs=(weight=bold) categoryorder=respdesc ;
xaxis grid label="Total Museums";
yaxis grid label="Museum Subject";
run;
I can see that the two biggest categories are local history and large (what we often call “stately” houses).
Of course, if you’re planning a vacation you will want to know what’s available in the area you’re planning to visit. For example, if you’re visiting my home city, Cardiff, you could use the following Proc SQL statement to filter the places using the admin_area field and then print a list of what is available.
proc sql;
create table cardiff_museums
as select *
from museums_abbr
where find(admin_area,"Cardiff");
quit;
proc print data=cardiff_museums;
var name_of_museum subject_matter size postcode;
title "Museums in Cardiff";
footnote
"Data from The Mapping Museums Project";
run;
I’ve visited most of these places and can strongly recommend them – there’s the added bonus of them all being within a reasonable distance of each other and you have their Post Code to plug into your Sat Nav to find your way there!
Data downloaded from the Mapping Museums website at www.mappingmuseums.org, Accessed on 5 November 2022.
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Hit the orange button below to see all the Free Data Friday articles.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.