Everyone has their own favourite Christmas movie – whether it’s the classic tearjerker It’s a Wonderful Life, Bruce Willis saving the day in Die Hard or fun for the young (and young at heart) in Frozen we watch these movies over and over. In our house the firm favourite is National Lampoon’s Christmas Vacation. Every year we watch and laugh at the misadventures of Clark Griswold as he tries and fails to create the perfect family Christmas. Much of the humor in the movie is slapstick and in slapstick nobody is ever really hurt badly or stays hurt for long. Real life is of course different and every year people around the world will spend at least part of the holidays in the emergency room of their local hospital being treated for Christmas related injuries. So, in this edition of Free Data Friday, we will be looking at data from the US Consumer Product Safety Commission relating to injuries caused by Christmas decorations to see if we can find out a little more about the types of accidents which occur and if we can use that information to keep ourselves a little safer during the holidays.
The Commission operates the National Electronic Injury Surveillance System (NEISS) which has a query builder tool allowing you to specify what data you want to retrieve. I used the builder to retrieve data for the ten-year period 2011-2020. The query builder gives you the option to download the data in XLSX, tab delimited or SAS format. Unfortunately, I couldn’t load the SAS file in SAS Studio or access the XLSX file with the XLSX library engine (I believe they must be corrupted in some way) and so I had to use the tab delimited option. Fortunately, there were no problems with this format.
I also downloaded the NEISS Coding Manual which is essential for understanding many of the fields in the downloaded data which are represented as codes.
Firstly, I used Proc Import to bring the data into a SAS data set.
filename reffile '/home/chris52brooks/ChristmasInjuries/NEISS.TXT'; Proc Import datafile=reffile dbms=dlm out=injuries replace; guessingrows=1000; delimiter='09'x; getnames=yes; run;
There are lots of variables in the file, so I had to decide where to focus my analysis. I picked on three areas
The body part which was injured; and
This is what the three fields look like in the imported file.
As you can see the problem with these variables is that they are all represented by codes and so I decided to use the coding manual to create custom formats which would hold the textual equivalents of these codes. I abbreviated some of the text values from the coding manual as they were unnecessarily lengthy for my needs. In addition, I created two formats for product code (prodcode and prodcodeyax). The format prodcodeyax was created for using the formatted values as bar labels on the Y axis of one of the charts – I added tildes at selected points to the text to be used as split characters, telling SAS to split the text onto a new line when it encounters a tilde, as will be seen later.
/* From NEISS Coding Manual at */ /* https://www.cpsc.gov/s3fs-public/2019_NEISS_Coding_Manual.pdf */ libname neissfmt "/home/chris52brooks/Formats/NEISS"; /* Abbreviated Body Parts */ proc format lib=neissfmt.neisscodes; value bodypart 33="Arm, lower" 80="Arm, upper" 37="Ankle" 94="Ear" 32="Elbow" 77="Eyeball" 76="Face" 92="Finger" 83="Foot" 82="Hand" 75="Head" 0="Internal" 35="Knee" 36="Leg, lower" 81="Leg, upper" 88="Mouth" 89="Neck" 38="Pubic region" 30="Shoulder" 93="Toe" 79="Trunk, lower" 31="Trunk, upper" 34="Wrist" 84="25-50% of body" 85="All parts of body" 87="Not recorded"; run; proc format lib=neissfmt.neisscodes; value diag 41 = "Ingestion" 42 = "Aspiration" 46 = "Burns, Electrical" 47 = "Burns, Not Specified" 48 = "Burns, Scald" 49 = "Burns, Chemical" 50 = "Amputaion" 51 = "Burns, Thermal" 52 = "Concussions" 53 = "Contusions, Abrasions" 54 = "Crushing" 55 = "Dislocation" 56 = "Foreign Body" 57 = "Fracture" 58 = "Hematoma" 59 = "Laceration" 60 = "Dental Injury" 61 = "Nerve Damage" 62 = "Internal Organ Injury" 63 = "Puncture" 64 = "Strain, Sprain" 65 = "Anoxia" 66 = "Hemorrhage" 67 = "Electric Shock" 68 = "Poisoning" 69 = "Submersion" 71 = "Other/Not Stated" 72 = "Avulsion" 73 = "Burns, Radiation" 74 = "Dermatitis, Conjunctivitis"; run; proc format lib=neissfmt.neisscodes; value prodcode 1141="Containers" 1701="Artificial Christmas Trees" 1711="Christmas Tree Lights" 1712="Christmas Tree Stands or Supports" 1729="Christmas Decorations, Nonelectric" 1736="Christmas Lights Electric (excl Christmas Tree Lights)" 1807="Floors or Flooring Materials" 1842="Stairs or Steps" 4074="Chairs" 4078="Ladders"; run; proc format lib=neissfmt.neisscodes; value prodcodeyax 1141="Containers" 1701="Artificial Christmas Trees" 1711="Christmas Tree Lights" 1712="Christmas Tree ~ Stands or Supports" 1729="Christmas Decorations, ~ Nonelectric" 1736="Christmas Lights Electric ~ (excl Christmas Tree Lights)" 1807="Floors or Flooring Materials" 1842="Stairs or Steps" 4074="Chairs" 4078="Ladders"; run;
Because this looked like a useful data source which I might want to return to at some point in the future I decided to save these formats in a permanent format catalog (neissfmt.neisscodes). This has several advantages over creating them on the fly for every program in which I need to use them.
I can ensure consistency between my programs; and
I can more easily share them with others.
I strongly recommend, however, that when you create permanent format catalogs you save the source code you used to create them in a separate file. If you need to amend the format, then the source code is essential.
This is what the file looks like with the formats bodypart, diag and prodcode applied.
To get the counts for the three variables I had to run Proc Freq three times to output the totals to different data sets. I then ran Proc Rank on the data sets, so I was able to display results for the top ten in each file.
proc freq data=injuries noprint; tables body_part / out=bpartcount; run; proc freq data=injuries noprint; tables diagnosis / out=diagcount; run; proc freq data=injuries noprint; tables product_1 / out=prodcount; run; proc rank data=bpartcount out=bpartrank descending; var count; ranks ranking; run; proc rank data=diagcount out=diagrank descending; var count; ranks ranking; run; proc rank data=prodcount out=prodrank descending; var count; ranks ranking; run;
I wanted to use a nice Christmas themed image as a background for my charts. In order to do that, once I had chosen my image, I first created an annotation data set.
data annodata; length function $5 height 8. width 8. transparency 8. drawspace $12. image $150. layer $4.; function="image"; height=100; width=100; transparency=0.4; drawspace="GraphPercent"; image="/home/chris52brooks/ChristmasInjuries/christmas-decorations-g70db98171_1920.jpg"; layer="back"; run;
If you want to do this, you can vary the transparency and of course the image name but the other variable names and values are best left unchanged.
I then ran three Proc SGPlot statements to display the top ten values of injuries by body part, diagnosis and product.
/* Declare the library which holds the custom formats */ libname neissfmt "/home/chris52brooks/Formats/NEISS"; /* Add that library to the search path for formats */ options fmtsearch=(neissfmt.neisscodes); proc sgplot data=bpartrank(where=(ranking <11)) nowall noborder sganno=annodata pad=0; format body_part bodypart.; title1 "Christmas Decoration Related Injuries 2011-2020"; title2 "Injuries by Body Part (Top Ten)"; footnote j=l "Data From the National Electronic Injury Surveillance System (NEISS)"; hbar body_part / response=percent datalabel datalabelattrs=(weight=bold) categoryorder=respdesc fillattrs=(color=red transparency=0.7); xaxis grid label="Percentage of Injuries" ; yaxis grid label='Body Part'; run; proc sgplot data=diagrank(where=(ranking <11)) nowall noborder sganno=annodata pad=0; format diagnosis diag.; title1 "Christmas Decoration Related Injuries 2011-2020"; title2 "Injuries by Diagnosis (Top Ten)"; footnote j=l "Data From the National Electronic Injury Surveillance System (NEISS)"; hbar diagnosis / response=percent datalabel datalabelattrs=(weight=bold) categoryorder=respdesc fillattrs=(color=red transparency=0.7); xaxis grid label="Percentage of Injuries"; yaxis grid label='Diagnosis'; run; proc sgplot data=prodrank(where=(ranking <11)) nowall noborder sganno=annodata pad=0; format product_1 prodcodeyax.; title1 "Christmas Decoration Related Injuries 2011-2020"; title2 "Injuries by Product (Top Ten)"; footnote j=l "Data From the National Electronic Injury Surveillance System (NEISS)"; hbar product_1 / response=percent datalabel datalabelattrs=(weight=bold) categoryorder=respdesc fillattrs=(color=red transparency=0.7); xaxis grid label="Percentage of Injuries"; yaxis grid label='Product' fitpolicy=splitalways splitchar="~"; run;
There are several things to note about this code.
I use the FMTSEARCH option to add the library holding my custom formats to the search path SAS will use when looking for formats; and
I set the bar transparency to 0.7 so that you can see the background image through the bars; and
In the final call to Proc SGPlot I set FITPOLICY and SPLITCHAR in order to control the appearance of the bar labels along the Y axis as described earlier.
Here are the outputs from the Proc SGPlot calls
So, what does all this tell us? The first chart shows that lower trunk injuries are the most common – looking through the narrative field for these injuries shows a lot of lower back pain complaints incurred when putting up and taking down decorations either caused by stretching, lifting boxes of decorations or falls from ladders, chairs etc. The second chart and narratives show the most common types of injuries are lacerations (usually caused by broken Christmas ornaments) and finally the most dangerous products are non-electric Christmas decorations. In conclusion then when you’re putting your decorations up or taking them down watch your back, don’t cut yourself on those broken ornaments and beware of the tinsel – but above all have a very happy Christmas!
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Hit the orange button below to see all the Free Data Friday articles.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.