Football (Soccer to my American friends) is very much a team game. No matter how good a star player is they need a good team around them or they won’t shine. Nevertheless star players are celebrated and receive awards for their achievements. One of the most prestigious of these awards is the Ballon d’Or (French for ‘Gold Ball’) awarded annually since 1956 to the outstanding male player of the year by the French football magazine France Football. Originally limited to European born players it was subsequently opened up to any player competing in European professional football and finally to any professional anywhere. The trophy is awarded after a vote by football journalists, international coaches and national team captains and is highly prized.
In this edition of Free Data Friday, we will be looking at data covering the winners of the Ballon d’or from 1956 to 2018 to see what we can learn from it about European football.
The data can be downloaded as a CSV file from the Data.World web site (free registration is available).
I used Proc Import to bring the data into a SAS data set. There were no issues with the imported file.
filename reffile '/home/chris52brooks/BallonDor/ballondor.csv';
proc import datafile=reffile
dbms=csv
replace
out=ballon;
guessingrows=700;
getnames=yes;
run;
This is what the file looks like
Firstly I used some simple Proc SQL to create files containing the total number of wins for each player, the total number of wins for each club, and the total by player nationality . This was done with some simple pieces of SQL.
proc sql;
create table playerrecord as
select distinct player,
count(rank) as wins
from ballon
where rank=1
group by player
order by wins desc;
quit;
proc sql;
create table clubrecord as
select distinct club,
count(club) as wins
from ballon
where rank=1
group by club
order by wins desc;
quit;
proc sql;
create table countryrecord as
select distinct nationality,
count(nationality) as wins
from ballon
where rank=1
group by nationality
order by wins desc;
quit;
I then displayed the results using Proc SGPlot
ods graphics / reset;
proc sgplot data=playerrecord(obs=5);
title1 "Ballon d'Or Winners (1956-2018)";
title2 "Top 5 Winners";
footnote j=r "Data From: https://data.world";
hbar player / response=wins
datalabel datalabelattrs=(weight=bold) categoryorder=respdesc;
xaxis grid label="Number of Wins";
yaxis grid label="Player";
run;
ods graphics / reset;
proc sgplot data=clubrecord;
title1 "Ballon d'Or Winners (1956-2018)";
title2 "Top Clubs";
footnote j=r "Data From: https://data.world";
hbar club / response=wins
datalabel datalabelattrs=(weight=bold) categoryorder=respdesc;
xaxis grid label="Number of Wins";
yaxis grid label="Club";
run;
ods graphics / reset;
proc sgplot data=countryrecord;
title1 "Ballon d'Or Winners (1956-2018)";
title2 "Top Nationalities";
footnote j=r "Data From: https://data.world";
hbar nationality / response=wins
datalabel datalabelattrs=(weight=bold) categoryorder=respdesc;
xaxis grid label="Number of Wins";
yaxis grid label="Nationality";
run;
This is what the results looked like
We can see that up until 2018 two players (Cristiano Ronaldo and Lionel Messi) are tied for the most wins (for the record Messi also won in 2019 and 2021 putting him ahead - no trophy was awarded in 2020 due to COVID-19). There is also a tie for top club between the two Spanish Giants Barcelona and Real Madrid. However if we look at winners nationalities we can see something interesting.Despite Spanish clubs winning 22 trophies between them only 3 wins went to players of Spanish nationality. I decided to find out who, exactly had won for them.
proc sql;
create table spanishrecord as
select distinct player,
club,
nationality,
count(player) as wins
from ballon
where rank=1 and club in("FC Barcelona" "Real Madrid CF")
group by player
order by club desc;
quit;
ods graphics / reset;
proc sgplot data=spanishrecord;
title1 "Ballon d'Or Winners (1956-2018)";
title2 "Winning Player Nationality for Spanish Clubs";
footnote j=r "Data From: https://data.world";
hbar nationality / response=wins
datalabel datalabelattrs=(weight=bold) categoryorder=respdesc;
xaxis grid label="Number of Wins";
yaxis grid label="Nationality";
run;
Only two players of Spanish nationality (Alfredo Di Stefano with 2 wins and Luis Suarez Miramontes with 1 win) contributed to the Spanish club totals. Moreover Di Stefano was born, brought up and originally played for Argentina.
This, coupled with the fact that Dutch players have won 7 times but only a single win (for Ajax) went to a Dutch club, shows us how cosmopolitan European football is. Players move from country to country a lot and the top players especially so making the clubs a rich mixture of nationality and cultures.
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Hit the orange button below to see all the Free Data Friday articles.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.