Data visualization with SAS programming

Fun With ODS Graphics: Box Plot Distribution of Top 50 Home Run Leaders of 1910-2015

Reply
Contributor tc
Contributor
Posts: 67

Fun With ODS Graphics: Box Plot Distribution of Top 50 Home Run Leaders of 1910-2015

[ Edited ]

HR.png

 

Drawing inspiration from Curtis Harris' fine The History of the Single Season Home Run Record, here's code for a quick box plot take on the same subject.

 

UPDATE: WaPo's Baseball just saw its biggest home run surge since the steroids era. Here’s why. has a nice explanation of some of the reasons behind the rise and fall of the longball.

 

* Box plot distribution of top 50 home run leaders of 1910-2015
  Data sourced from Lahman's Baseball Database (seanlahman.com/baseball-archive/statistics);
  
data homers(keep=playerID yearID teamID HR);                   * Grab batting stats;
infile '/folders/myfolders/BaseBall/Batting.csv' dlm=',' dsd truncover firstobs=2;
input playerID : $10. yearID stint teamID : $3. lgID : $2. G AB R H D T HR;

data players(keep=playerID namefirst namelast compress=char);  * Grab player info;
infile '/folders/myfolders/BaseBall/Master.csv' dlm=',' dsd truncover firstobs=2;
input playerID : $10. birthYear birthMonth birthDay birthCountry : $10. birthState : $10. birthCity: $30. 
      deathYear deathMonth	deathDay deathCountry : $10. deathState : $10. deathCity: $30. 
      nameFirst: $30. nameLast : $30.;

proc rank data=homers out=HRrank ties=low descending;          * Rank players by HR for each season;
by yearId; var HR; ranks HRrank;

proc sql;                                                      * Merge data, tag batters w/60+ HR, keep top 50;
create table HRleaders as
select yearid, hr,
       case when hr>=60 then '   '||trim(namelast)||" ("||put(hr,z2.)||','||put(yearid,z4.)||")" end as HR60
from HRrank, players where HRrank.playerid=players.playerid and HRrank between 1 and 50;

ods listing image_dpi=300 gpath='/folders/myfolders';          * One boxplot/season for top 50 HR hitters;
ods graphics on / reset antialias width=14in height=11in imagename="HR" antialiasmax=6000;
proc sgplot data=hrleaders(where=(yearid>=1910)) noautolegend;
Title "Top 50 Home Run Leaders 1910-2015";
vbox hr / category=yearid boxwidth=.35;
text x=yearid y=HR text=HR60 / position=right;
xaxis display=(nolabel) valueattrs=(size=7pt) type=linear values=(1910 to 2015 by 5) offsetmax=.005;
yaxis display=(nolabel) valueattrs=(size=7pt) grid;
footnote height=7pt "Note: League followed 154 game schedule until 1961, and 162 games since. Number of games varied widely from 1901-1919, and strikes shortened '72, '81, '94, '95 seasons.";
run;

 

 

Grand Advisor
Posts: 9,713

Re: Fun With ODS Graphics: Box Plot Distribution of Top 50 Home Run Leaders of 1910-2015

For added fun: Normalize this to the number of games played.

Post a Question
Discussion Stats
  • 1 reply
  • 387 views
  • 1 like
  • 2 in conversation