SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
With the climate change issue continuing to hit the headlines it’s certain that the search for a clean, renewable source of energy will be given added impetus. Wind, solar, tidal, geothermal and other sources have all been tapped, but each has its own issues as well as opportunities.
In this edition of Free Data Friday we will be looking at solar power generation with data provided by the University of Sheffield whose Solar PV project provides live thirty minute estimates of solar power generation throughout the United Kingdom.
The data was downloaded as a series of CSV files from the main Solar PV page. You can only download a maximum of 12 months of data at a time, so I downloaded three separate files, one for 2017, one for 2018 and one for 2019. I then used PROC Import to import the files into SAS.
filename solfile '/folders/myshortcuts/Dropbox/PV_Live Historic Results_2019.csv';
proc import datafile=solfile
dbms=csv
out=solar2019
replace;
getnames=yes;
run;
filename solfile '/folders/myshortcuts/Dropbox/PV_Live Historic Results_2018.csv';
proc import datafile=solfile
dbms=csv
out=solar2018
replace;
getnames=yes;
run;
filename solfile '/folders/myshortcuts/Dropbox/PV_Live Historic Results_2017.csv';
proc import datafile=solfile
dbms=csv
out=solar2017
replace;
getnames=yes;
run;
This is what the files looked like:
I then appended the files together and extracted the date from the datetime field. I did this as the datetime format was causing issues in PROC Timeseries which I wanted to use in my analysis. As time of day wasn’t a factor which I was going to explore in this analysis I could simply use the date portion.
data solar;
set solar2017 solar2018 solar2019;
date=datepart(datetime_gmt);
run;
I decided to use PROC Timeseries for my analysis. You may not have come across this procedure before but if you plan to do any time series analysis you will find it invaluable. It’s part of the SAS Econometrics and Time Series (ETS) product which is available in SAS University Edition (although not all ETS procedures are included in the product some additional ones are in SAS OnDemand for Academics) and is capable of performing a wide range of simple transformations and analyses of date and time stamped data.
Here I want to aggregate some of the fields, but I want to use different methods of aggregation for each field. Specifically, I want to convert the 30-minute observations to daily observations and I want to sum the power generation, but find the largest value per day for the installed capacity.
In the code below you’ll see the interval is set to day – this means I want to convert the periodicity of the file to daily observations. You are only allowed one id statement in PROC Timeseries, but you can have many VAR statements that detail how the variables should be treated. In this case, I want different accumulate methods for each of the two variables and this is allowed. Here is the code and output from the PROC Timeseries:
proc timeseries data=solar out=timeseries;
id date interval=day;
var generation_mw / accumulate=total;
var installedcapacity_mwp / accumulate=maximum;
run;
I can now plot this using PROC SGPlot – firstly for the power generated per day.
title1 "UK Solar Power Generation";
title2 "Photovoltaic Power Generation Estimate (Megawatts)";
footnote1 j=l "Data From The University of Sheffield";
footnote2 j=l "https://www.solar.sheffield.ac.uk/pvlive/";
proc sgplot data=timeseries;
format generation_mw comma7.;
series x=date y=generation_mw;
xaxis label="Date";
yaxis label="Power Generation Estimate (Megawatts)";
run;
You won’t need to do a seasonal analysis to see that there is a strong seasonal effect here. There is quite a lot of noise in the data, but there is still a very strong rise in power generated during summer months compared to the rest of the year. Logically you would expect that, so let’s plot the installed capacity to see if there is anything interesting there. You should note that, as in my previous Free Data Friday article I’m using a step chart.
title1 "UK Solar Power Generation";
title2 "Installed Capacity (Megawatt Peak";
footnote1 j=l "Data From The University of Sheffield";
footnote2 j=l "https://www.solar.sheffield.ac.uk/pvlive/";
proc sgplot data=timeseries;
format installedcapacity_mwp comma7.;
step x=date y=installedcapacity_mwp;
xaxis label="Date";
yaxis label="Installed Capacity (Megawatt Peak)";
run;
This is interesting – in early 2017 there was an exponential rise in capacity which steadied and continued rising until early 2019 when it flatlined for the rest of the year. Let’s see if the additional capacity resulted in increased generation by using another PROC Timeseries call this time aggregating by year:
proc timeseries data=solar out=timeseries_year;
format generation_mw comma10.;
id date interval=year;
var generation_mw / accumulate=total;
var installedcapacity_mwp / accumulate=maximum;
run;
This gives us the following dataset:
We can see that despite the increase in capacity during 2019 power generation actually fell slightly (presumably due to weather conditions). So what can we draw from this evidence?
That solar power generation is not directly related to capacity; and
That after a period of explosive growth, the increase in capacity seems to be tailing off – does this mean another source is supplanting solar as a favoured energy supplier?
To sum up it seems that solar power will not be the complete answer for power generation in the UK. It's simply not going to be predictable enough as a source, but it will probably form part of an overall portfolio of renewable sources including wind and possibly tidal power.
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Happy Learning!
Visit [[this link]] to see all the Free Data Friday articles.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.