Editor's note: SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
British politics has been dominated by one subject for the last three years – Brexit. The unexpected victory of leave voters in the 2016 European Union referendum has pushed all other topics to the periphery.
Since that vote, two prime ministers have resigned, the ruling Conservative Party has lost its overall majority in the British Parliament, there have been large-scale defections from both Conservative and Labour parties in the House of Commons and new parties have risen and fallen at a rapid pace.
Now, with the October 31 deadline for leaving the EU getting closer and closer, another general election is expected at any time. One feature of this election is expected to be the geographical split between leave and remain areas.
Although the UK voted by a margin of 52% to 48% to leave, that narrow result masks wide geographical differences. In this edition of Free Data Friday, we will be using results data from the UK Electoral Commission to look at a way of measuring the degree of difference between areas in that vote.
You can download the results of the vote from the UK Electoral Commission web site. The file can be downloaded in CSV format and imported into SAS with PROC Import.
The file imported without any significant issues. There are a couple of minor points which we will need to address:
The British Overseas Territory of Gibraltar is part of the EU by virtue of its relationship to the UK and therefore took part in the vote. However, it is not represented in the UK Parliament and therefore we will exclude it from our calculations; and
Some of the fields which should be numeric were imported as character fields and will be converted.
Here is the code for the import and data cleaning:
filename reffile '/folders/myshortcuts/Dropbox/EU-referendum-result-data.csv'; proc import datafile=reffile dbms=csv out=import replace; getnames=yes; guessingrows=700; run; /* Area Code GI is Gibraltar */ data area_results(keep=region area valid_votes remain leave); set import(where=(area_code ne "GI")); new_remain = input(remain, 8.); drop remain; rename new_remain=remain; new_leave = input(leave, 8.); drop leave; rename new_leave=leave; new_valid = input(valid_votes, 8.); drop valid_votes; rename new_valid=valid_votes; run;
This is what the imported file looks like:
The Index of Dissimilarity can be used to determine the percentage of one constituent part of the population which would have to move areas to achieve a uniform geographic distribution amongst the sub-areas of a larger area. It is often used to determine racial balance in areas within a state or country or gender balance within occupations.
In our case this would mean that an index of zero would mean that all the counting areas had a 52% leave to 48% remain vote identical to the overall total. The method of calculating this was discussed in a 2016 SAS Communities Forum thread and we will be using the PROC SQL statement from that thread to perform the calculation.
Here is the PROC SQL statement referred to earlier which calculates the index:
proc sql; create table uk_index as select *, leave/sum(leave) as var1, remain/sum(remain) as var2 from area_results; select 0.5*sum(abs(var1-var2)) as d1 from uk_index; quit;
Here is the result:
This means that 16.5% of UK leave voters would have to move counting areas for the vote split to be uniform across all areas. On a turnout of 33,551,983 that is over 5.5 million people which is a significant imbalance.
This isn't the whole story however - while England and Wales voted for leave, the other two constituents of the UK, Scotland and Northern Ireland, voted for remain. I decided to see if the internal dissimilarity in England, Wales and Scotland was roughly the same (Northern Ireland was a single counting area so we can't calculate an index for it).
Here is the code for those calculations:
data regional_results_eng; set area_results(where=(region not in ("Scotland", "Wales", "Northern Ireland"))); run; proc sql; create table eng_index as select *, leave/sum(leave) as var1, remain/sum(remain) as var2 from regional_results_eng ; select 0.5*sum(abs(var1-var2)) as d1 from eng_index; quit; data regional_results_scot; set area_results(where=(region ="Scotland")); run; proc sql; create table scot_index as select *, leave/sum(leave) as var1, remain/sum(remain) as var2 from regional_results_scot ; select 0.5*sum(abs(var1-var2)) as d1 from scot_index; quit; data regional_results_wales; set area_results(where=(region ="Wales")); run; proc sql; create table wales_index as select *, leave/sum(leave) as var1, remain/sum(remain) as var2 from regional_results_wales ; select 0.5*sum(abs(var1-var2)) as d1 from wales_index; quit;
Here is the index value for England:
A score of 15.5 % is pretty close to the UK total (perhaps not surprisingly given the relative size of Englands population to the UK total).
This is Scotland's score:
This is a lot less than the UK and England-only values and of course with remain having won, only a shade over 273,000 remain voters would have to move areas to achieve uniformity.
Finally Wales' index is:
This is the smallest of the three index values - only just over 144,700 leave voters would need to move to achieve the perfect zero index.
The question is, then, what does this mean for the upcoming election? The results imply that, especially in England, leave areas are more pro-leave than the country as a whole and remain areas more pro-remain. If people vote along strict leave/remain lines then this is likely to lead to a very polarised result particularly if parties campaign with a strategy of trying to mobilise their base votes rather than win over opposition voters. Having said that, events are moving at a very rapid pace and as is often said - the only certain thing with Brexit is that nothing is certain!
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Visit [[this link]] to see all the Free Data Friday articles.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.