BookmarkSubscribeRSS Feed

Discovering America's Most (and Least) Dangerous Jobs with SAS

Started ‎05-31-2019 by
Modified ‎08-03-2021 by
Views 3,368

SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:

 

Access Now

 

Every year thousands of people die in workplace incidents. The type of incident varies enormously – exposure to dangerous chemicals, road traffic accidents, assaults and so on. In this edition of Free Data Friday, we will be looking at Federal data to discover not only America’s most dangerous jobs but also its least dangerous and you may be pleasantly surprised by the conclusion from that second analysis….

 

Get the Data

 

You can get the data from the Bureau of Labor Statistics (BLS) web site by downloading it in XLSX format

 

Get Started with SAS OnDemand for Academics

 
In this 9-minute tutorial, SAS instructor @DomWeatherspoon shows you how to get your data into SAS OnDemand for Academics and other key steps:

Get Started

 

Getting the Data Ready

 

Although XLSX files are easy to read into SAS using the XLSX Libname engine the layout of the downloaded file posed significant problems. You can see from the screen shot below that multiple tables are included on the same sheet and additionally the table that we will be using includes totals and partial splits of the totals which don’t add up to the totals themselves. In order to make the process easier I edited the spreadsheet to remove tables I didn’t want, delete rows for occupations other than Major Groups defined by in the Standard Occupational Classification codes (codes ending 0000) and also edited the column titles. The final sheet I used is an attachment to this article.

 

Original FileOriginal File

 

The Results

 

As I mentioned in the introduction, I want to discover both the most and least dangerous jobs and I also want to take into account the fact that some jobs have more workers than others and also work more hours than others. Obviously, the fatalities variable gives me the absolute number of fatalities for each occupational group while fatal_injury_rate is defined by the BLS as the number of fatalities per 100,000 full-time equivalent workers and is their recommended metric for making these comparisons.

 

Having identified the variables I want to use I first wrote a simple data step to strip out any residual spaces which may exist at the beginning and end of the occupation variable caused by formatting in the spreadsheet. Then, in order to get the top ten for each category I could sort the data set by each variable in turn by ascending then descending values. Considering that I have four variables that’s a lot of sorting. There is an alternative however – the Rank Procedure which allows the creation of rank values for multiple variables in one procedure call. For each call the ranked order must be for either ascending or descending values but it still only requires two calls as opposed to eight separate sorts. Here is the code run so far:

 

 

data fatality_rates;
	set xl.cfoi_rates_2017hb;
	occupation=strip(occupation);
run;

proc rank data=fatality_rates out=most_rankings descending ties=low;
	var fatal_injury_rate fatalities;
	ranks rate_rank fatality_rank;
run;

proc rank data=fatality_rates out=least_rankings  ties=low;
	var fatal_injury_rate fatalities;
	ranks rate_rank fatality_rank;
run;

 

The output data sets look like this:

 

Most FatalitiesMost Fatalities

 

Least FatalitiesLeast Fatalities

 

Now that I’ve got variables which allow me to extract the top ten for each category, I can create bar charts to display my results. In this case I’ve chosen horizontal bar charts as I feel that’s a better choice when displaying top n values in order. I’ve also added titles, a footnote giving the data source and some styling.

 

Here’s the code and output for the top ten most dangerous jobs, both by number of fatalities and fatality rate.

 

 

ods graphics / reset width=8in height=8in imagemap;
title1 'USA Fatal Injury Rate by Occupation';
title2 'Number of Fatalities';
title3 'Top Ten by Number of Fatalities';
footnote j=l 'Data from the Bureau of Labor Statistics';
proc sgplot data=most_rankings (where=(rate_rank<=10));
	styleattrs backcolor=lightblue wallcolor=lightblue;
	hbar Occupation / response=fatalities fillattrs=(color=gold) categoryorder=respdesc;
	yaxis grid label="Occupation";
	xaxis grid label="Number of Fatalities";
run;

ods graphics / reset width=8in height=8in imagemap;
title1 'USA Fatal Injury Rate by Occupation';
title2 'Rate per 100,000 Full Time Workers';
title3 'Top Ten Rates';
footnote j=l 'Data from the Bureau of Labor Statistics';
proc sgplot data=most_rankings (where=(rate_rank<=10));
	styleattrs backcolor=lightblue wallcolor=lightblue;
	hbar Occupation / response=Fatal_Injury_rate fillattrs=(color=gold) categoryorder=respdesc;
	yaxis grid label="Occupation";
	xaxis grid label="Fatal Injury Rate";
run;

 

Top Ten by Number of Fatalities.png

 

 

Top Ten Fatality Rates.png

 

We can see that while transportation and construction are at the top of the list in terms of absolute number of fatalities when we look at the fatality rate then farming, fishing and forestry occupations leapfrog them into top spot indicating it’s ranking as the most dangerous occupation in the USA.

 

As mentioned in the introduction though I also wanted to find the least dangerous jobs in America – here’s the code and output for those graphs:

 

 

ods graphics / reset width=8in height=8in imagemap;
title1 'USA Fatal Injury Rate by Occupation';
title2 'Number of Fatalities';
title3 'Top Ten by Least Number of Fatalities';
footnote j=l 'Data from the Bureau of Labor Statistics';
proc sgplot data=least_rankings (where=(rate_rank<=10));
	styleattrs backcolor=lightblue wallcolor=lightblue;
	hbar Occupation / response=fatalities fillattrs=(color=gold) categoryorder=respasc;
	yaxis grid label="Occupation";
	xaxis grid label="Number of Fatalities";
run;

ods graphics / reset width=8in height=8in imagemap;
title1 'USA Fatal Injury Rate by Occupation';
title2 'Rate per 100,000 Full Time Workers';
title3 'Top Ten by Lowest Rates';
footnote j=l 'Data from the Bureau of Labor Statistics';
proc sgplot data=least_rankings (where=(rate_rank<=10));
	styleattrs backcolor=lightblue wallcolor=lightblue;
	hbar Occupation / response=Fatal_Injury_rate fillattrs=(color=gold) categoryorder=respasc;
	yaxis grid label="Occupation";
	xaxis grid label="Fatal Injury Rate";
run;

 

Least Number of Fatalities.png

 

Lowest Fatality Rates.png

 

While computer and mathematical occupations and legal occupations are tied for first place in number of fatalities, when you then look at the fatality rate, computer and mathematical occupations are the clear leader as the safest jobs in the country.

 

Of course, this includes SAS professionals so, if you’re reading this article you can relax safe in the knowledge that you work in one of the safest jobs in the country!

 

Now it's Your Turn!

 

Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.

 

Visit [[this link]] to see all the Free Data Friday articles.

Comments

You could do the rankings in a single pass with no explicit sorting using hash objects or arrays in a data step.  I wonder which of the three methods would be faster.  Hash objects might be very fast with the right programming.

Hi Jack - you may very well be right - I've used hash objects a lot for merging very large files and my experience is they're super efficient for that, but the merge statement wins when smaller files are used. I suspect it might be the same with sorting/ranking but I'd need to do some experiments to prove that. Meanwhile Proc Rank has the benefit of simplicity 🙂

Thanks for posting this. Interesting stuff here, nicely explained. Of course, most programming is done sitting, as is most typing. At about 1,800 work hours per year, that's potentially over 70,000 sedentary hours in a 40 year career, which can be lethal in other ways. But that's a subject for a different post 🙂 

Version history
Last update:
‎08-03-2021 02:25 PM
Updated by:

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags