SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
Polls consistently show that one of the things people most enjoy about Christmas is spending time with their families. It’s a time when families gather, exchange presents, eat lots of good food and observe time-honoured family traditions. However, for some people this isn’t possible because they have to, or choose to, work on Christmas Day. In this edition of Free Data Friday we will be looking at an occasional series published by the UK Office for National Statistics (ONS) which tells us what percentage of workers turn up for work on Christmas Day.
The data is available for download as an XLS file from the ONS web site. The latest period for which the data is available is Christmas 2016.
Unfortunately, the ONS has published a number of different data tables on the same worksheet which makes importing the data into SAS a little more complicated. In this scenario the easiest way to proceed is to use the RANGE statement in Proc Import. You can either use an explicit cell reference or an Excel range name. It’s at this point that I have to make a confession – I am a former spreadsheet programmer. I have always said that the key to good spreadsheet programming is using range names. For example, in our case if the table we want is positioned in different cells in different years we can still use the same program without amendment. This wouldn’t be possible if we had used cell references. With this in mind I opened the spreadsheet, created range names for each of the tables and saved the file in the more modern XLSX format. You can find more information about using range names with Proc Import in an excellent blog post from Chris Hemedinger.
I decided to use two of the tables on the sheet so issued two Proc Import statements as follows
proc import datafile="/folders/myshortcuts/Dropbox/worker_christmas.xlsx"
out=region_import
replace
dbms=xlsx;
range="region";
run;
proc import datafile="/folders/myshortcuts/Dropbox/worker_christmas.xlsx"
out=occupation_import
replace
dbms=xlsx;
range="occupation";
run;
This gave me two data sets looking like this
The next steps were to sort the regional file, renaming and dropping some variables and tidying up the occupation file, removing the codes at the start of the occupation name, removing quotation marks and similarly renaming and dropping variables.
proc sort data=region_import(where=(k ne 'Total'))
out=region(rename=(k=region as_percentage_of_region=perc) drop=level);
by descending as_percentage_of_region;
run;
data occupation(keep=occupation perc);
length occupation $40;
set occupation_import(rename=(as_percentage_of_occupation=perc));
occupation=dequote(strip(substr(f,6)));
run;
The data is already aggregated so no further manipulation is required - instead I first printed the region table making use of Proc Print like so
title1 'Who Works Over Christmas?';
title2 'UK Regional Percentages of People Working on Christmas Day';
footnote1 j=c 'Data from the UK Office for National Statistics';
footnote2 j=c 'https://www.ons.gov.uk';
proc print data=region noobs split="*";
label region='Region*';
label perc='Percentage Working*';
run;
This gave me the following listing
One interesting point from this is the position of Scotland at the head of the table. More than twice as many workers in percentage terms work in Scotland on Christmas Day than in London and Northern Ireland. It is true that in Scotland New Year is considered a more important holiday than it is in other parts of the country and Christmas Day a little less so but to my mind this doesn't wholly explain the disparity between Scotland and all the other regions.
For the occupation data set I decided to create a vertical bar chart. Here is the code I used followed by the output
ods graphics / imagemap;
title1 'Who Works Over Christmas?';
title2 'Top 15 Occupations Working on Christmas Day in the UK 2016';
footnote1 j=l 'Data from the UK Office for National Statistics';
footnote2 j=l 'https://www.ons.gov.uk';
proc sgplot data=occupation;
styleattrs
backcolor=biyg
wallcolor=biyg;
vbar occupation /
response=perc
categoryorder=respdesc
fillattrs=(color=red)
dataskin=sheen
datalabel=perc
datalabelpos=data
tip=(occupation perc)
tiplabel=("Occupation:" "Percentge Working:");
xaxis label="Occupation";
yaxis label="Percentage Working";
run;
My initial thought was that this chart would be dominated by the emergency services - fire, police and medical staff but when I saw clergy at the top of the list it made perfect sense. In fact 51.4% seems a little low even though this would be almost exclusively Christian clergy. The other surprise to me was the 8.5% of sales supervisors working when virtually all shops are closed on Christmas Day.
There is, of course, one occupation missing from this chart - it has only one person working in it and is a curious amalgam of delivery services, animal wrangling and toy manufacture. It is, as you might have guessed, the one man who has worked every Christmas Eve and Christmas Day since time immemorial - Santa Claus.
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Visit [[this link]] to see all the Free Data Friday articles.
You're absolutely right that using named ranges in Excel can greatly simplify Excel programming. I don't think that many people are aware, for example, that you can use named range intersections to define cells, so in your first example I could refer to "(North_East Level)" instead of having to manually find "$B$2". One table per sheet is also very helpful for auditable programs.
But if you want to do such programming against a workbook created by SAS, how would you do it? I don't see a way in ODS EXCEL, ODS TAGSETS.EXCELXP, or PROC EXPORT to specify named ranges on output, and that could be very helpful. Writing an Excel macro to name ranges in SAS, and then executig that macro once you're in Excel, seems like a kludge.
'Christmas day' could also be sometime between January 6 and January 19, according to some other christian groups.
Best to stick with the original Winter Solstice, which is more stable in so many ways.
And Santa Claus was invented in 1821...hardly 'time immemorial'.
And even St. Nicolas himself was no sooner than the early common era.
Hi @JackHamilton - I'm afraid I don't know of any straightforward way of writing to an Excel named range directly from SAS either. Some years ago I wrote a SAS Global Forum paper on using VBA with ADO to call SAS programs and exchange data but that's possibly a little out of date now and maybe overkill for most scenarios. It does work well though and I have used it a number of times for customers. The second part of the paper covers creating Excel add-ins to wrap these calls where the user specifies the output range which is the closest I can come up with 🙂
You and @tomrvincent are of course correct about Santa being more modern than 'time immemorial' but I think I can defend the argument for his predecessor in the job, St Nick, here. I checked Wikipedia and found that in English Common Law time immemorial is anything before 6 July 1189......
Since the time of Richard the Lionhearted, huh? That's an interesting piece of trivia I would never have guessed.
I think it should be possible to create named ranges with the excelxp tagset using tagattrs to define the ranges, and with proc report it should be possible to define ranges smaller than entire rows or columns. That's beyond my responsibilities and capabilities, but under Clarke's First Law it *is* possible.
It would be better to put that into the excel ODS destination. I don't know anything about how non-tagset destinations are created. Are they extensible in the way tagsets are, do you know?
@JackHamilton this is an intriguing problem. I've found an old paper from SUGI 31which shows how you can create named ranges using the EXCEL engine. I've tried this with the XLSX engine and it works with that too. Unfortunately you can only create new ranges starting at the top left hand cell but if the range already exists elsewhere you can write to that.
I'll keep pondering this and come back if I find anything better.....
too funny!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.