SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
Access Now
Many, if not most of us, will at some point have used census data during the course of their work or study. Governments have been conducting censuses of their population for centuries, but there are other types of census.
Eurostat (the statistical office of the European Union) conducts a census of agricultural animals present in the EU and candidate member countries. In this edition of Free Data Friday, we will be looking at that census data to find out where these animals are and how many of them there are.
Get the Data
The report of the census contains links to the individual data tables for four species of animal – cattle, pigs, sheep and goats. These tables can be downloaded in a number of formats. I chose to download the tables as CSV files and import them into SAS using PROC Import.
Get Started with SAS OnDemand for Academics
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:
Get Started
Getting the Data Ready
Given that I had four separate files that are all of the same format, I decided to create a macro to handle the import, data cleaning, and graphing of the results. Here is the macro code along with the calls to the macro for the import of each of the files:
%macro import_lstock(animal_name,long_desc,common_name);
%let reffile=/folders/myshortcuts/Dropbox/&animal_name..csv;
filename reffile "&reffile";
proc import datafile=reffile
dbms=csv
out=&animal_name;
getnames=yes;
datarow=2;
guessingrows=100;
run;
data live_&animal_name;
length num_animals 8.;
format num_animals comma8.;
set &animal_name(where=(animals="Live &long_desc"
and geo not like "European%" and time="2018"));
if value=":" then value=missing;
if scan(geo,1)="Germany" then geo="Germany";
if scan(geo,1)="Kosovo" then geo="Kosovo";
num_animals=input(value,comma8.);
run;
proc sort data=live_&animal_name out=sorted_&animal_name;
by num_animals;
run;
ods graphics / reset width=6.4in height=8in imagemap;
title "Number of &common_name in each Eurostat Country - 2018";
footnote j=l 'Data from Eurostat';
proc sgplot data=sorted_&animal_name;
styleattrs backcolor=lightblue wallcolor=lightblue;
hbar geo / response=num_animals datalabel fillattrs=(color=gold);
yaxis discreteorder=data reverse label="Country";
xaxis grid label="Number of Animals (thousands)";
run;
ods graphics / reset;
%mend import_lstock;
%import_lstock(cows,bovine animals,Cows);
%import_lstock(sheep,sheep,Sheep);
%import_lstock(pigs,swine%str(,) domestic species,Pigs);
%import_lstock(goats,goats,Goats);
The PROC Import was straightforward. However, the files contained some aggregate records for EU totals that had to be removed in the following data step along with the removal of some text in certain country names which was, for our purposes, extraneous (i.e. explanations surrounding Germany reunification, Kosovo and Macedonia). Additionally, missing values were, for some reason, represented by a colon so that needed to be edited. The files were then individually sorted and displayed through PROC SGPlot.
The Results
Here is the output from the PROC SGPlot call for each of the files:
Who leads in cows, sheep, pigs, and goats?
Taking them in order, France is the country with the most cows, the United Kingdom has the most sheep, Spain the most pigs and Greece is home to the most goats. Other significant points from the results are:
Slightly surprisingly to me: Turkey has the second largest number of cows in the Eurostat countries; and
Spain figures highly in all four graphs coming in first in one category and second in two others; and
Despite the fame of Danish bacon, Denmark is only fourth in the list of pig rearing countries; and
Countries such as Germany and the United Kingdom, which figure heavily in the first three charts, fall a long way down in the final chart of goat population.
I then decided to find the total of each animal across all the Eurostat countries. Firstly, I appended the four files into one file and then used PROC SQL for the summation; as the numbers in the downloaded files are in thousands, I multiplied the totals by 1,000. Here is the code and the output:
%if %sysfunc(exist(all_animals)) %then %do;
proc datasets lib=work;
delete all_animals;
quit;;
%end;
proc append base=all_animals data=live_cows;
run;
proc append base=all_animals data=live_sheep force;
run;
proc append base=all_animals data=live_pigs force;
run;
proc append base=all_animals data=live_goats force;
run;
title "Total Number of Farm Animals in Eurostat Countries";
proc sql;
select animals, sum(num_animals)*1000 as total format=comma12.
from all_animals
group by animals
order by animals;
quit;
The most common farm animals in the Eurostat countries are pigs (live swine) with 153 million animals, a number larger than the population of any of the individual countries.
Now it's your Turn!
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Visit [[this link]] to see all the Free Data Friday articles.
... View more