What’s this data?
How to download
If you don’t already have University Edition, get it here and, follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials.
To download this data, go to the link provided and download the first six datasets from the 2014-2015 season. There are seven, but omit the last one, Drives Against, because it lacks useful data for the task at hand.
How to get the data and prep it for analysis
The code, while lengthy, is pretty simple. The first post using this data focused on rushing defense. This analysis features the same coding method, but merges the rushing defense with all the other aspects of the defense and special teams.
A special challenge of this data is how University Edition treats numbers with decimals. For example, if you have a whole number, like 6, in data where numbers are in the format 3.1, University Edition will record it as .6. The best way to fix this is to go variable by variable and apply math to fix the numbers and format the digits correctly.
Next, the main new task is merging these datasets. The first thing to do is sort all your datasets by team, since we want to merge all of these datasets by team. The merge starts the same as any other data step, but instead of using a set statement or an infile statement, you use a merge statement. Then list all the datasets you’ve created.
The next statement will be a by statement. This indicates which variable you merge by. The choice for us is Team; this will allow us to have one row per team. Now we have a dataset with all these variables and one row per team.
Create your fantasy point calculator
The next challenge is making a fantasy point calculator. The values I use are standard scoring format settings. It is very easy to use math to apply values to all your variables to create new variables that indicate fantasy point values for each team’s interception total, fumble total, etc. Adding in the fantasy rank is the last touch.
filename Team_D "/folders/myfolders/my_data/NFL Team Defense.csv"; filename SpecT_D "/folders/myfolders/my_data/NFL Special Teams Defense.csv"; filename SpecT_O "/folders/myfolders/my_data/NFL Special Teams Offense.csv"; filename Scor_D "/folders/myfolders/my_data/NFL Scoring Defense.csv"; filename Rush_D "/folders/myfolders/my_data/NFL Rushing Defense.csv"; filename Pass_D "/folders/myfolders/my_data/NFL Passing Defense.csv"; data Team_Defense; infile Team_D dlm=',' dsd missover firstobs=3; input Ovr_Rank :2. Team :$22. G :2. Tot_Pts_A :5. Tot_Yds_A :6. Plays :5. Tot_Yds_per_Play :3.1 Tot_TOs :3. Fmbl_Rec :3. Fst_Dwns_Pty :4. Fst_Dwns :5. P_Cmp_A :5. P_Att_A :5. P_Yds_A :6. P_TD_A :3. Int :3. Net_P_Yds_per_Att :6. P_Fst_Dwns_A :4. R_Att_A :5. R_Yds_A :5. R_TD_A :3. R_Yds_per_Att :3.1 R_Fst_Dwns_A :4. SC_Pct :1. TO_Pct :1. Tot_Exp_Pts_Cont :7.2; if anydigit(Ovr_Rank)=0 then delete; if Tot_Yds_per_Play < 1 then Tot_Yds_per_Play=Tot_Yds_per_Play * 10; if R_Yds_per_Att < 1 then R_Yds_per_Att=R_Yds_per_Att * 10; drop SC_Pct TO_Pct; run; data Special_Teams_Defense; infile SpecT_D dlm=',' dsd missover firstobs=3; input SpT_D_Rank :2. Team :$22. G :2. Punt_Ret_D :4. Punt_Yds_D :4. Punt_TD_D :2. Punt_Long_D :2. Punt_Yds_per_Ret_D :4.1 Kick_Ret_D :4. Kick_Yds_D :4. Kick_TD_D :2. Kick_Long_D :2. Kick_Yds_per_Ret_D :4.1 All_Purp_Yds_D :6.; if anydigit(SpT_D_Rank)=0 then delete; if Punt_Yds_per_Ret_D<=1 then Punt_Yds_per_Ret_D=Punt_Yds_per_Ret_D * 10; if Kick_Yds_per_Ret_D< 10 then Kick_Yds_per_Ret_D=Kick_Yds_per_Ret_D * 10; drop Punt_Long_D Kick_Long_D All_Purp_Yds_D; run; data Special_Teams_Offense; infile SpecT_O dlm=',' dsd missover firstobs=3; input SpT_O_Rank :2. Team :$22. G :2. Punt_Ret_O :4. Punt_Yds_O :4. Punt_TD_O :2. Punt_Long_O :2. Punt_Yds_per_Ret_O :4.1 Kick_Ret_O :4. Kick_Yds_O :4. Kick_TD_O :2. Kick_Long_O :2. Kick_Yds_per_Ret_O :4.1 All_Purp_Yds_O :6.; if anydigit(SpT_O_Rank)=0 then delete; if Punt_Yds_per_Ret_O<=1 then Punt_Yds_per_Ret_O=Punt_Yds_per_Ret_O * 10; if Kick_Yds_per_Ret_O< 10 then Kick_Yds_per_Ret_O=Kick_Yds_per_Ret_O * 10; drop Punt_Long_O Kick_Long_O; run; data Scoring_Team_Defense; infile Scor_D dlm=',' dsd missover firstobs=2; input Scor_D_Rank :2. Team :$22. G :2. Rsh_TD_A :3. Rec_TD_A :3. PR_TD_A :2. KR_TD_A :2. Fmbl_TD_Ret :2. Int_TD_Ret :2. Oth_TD :2. All_TD :2. _2_PM :2. X_Pts_M :4. FG_M :3. Sfty :2. Tot_Pts_Scrd :5. Pts_per_Game :4.1; if anydigit (Scor_D_Rank)=0 then delete; if Fmbl_TD_Ret=. then Fmbl_TD_Ret=0; if Int_TD_Ret=. then Int_TD_Ret=0; if Sfty=. then Sfty=0; drop Oth_TD All_TD _2_PM X_Pts_M FG_M Tot_Pts_Scrd Pts_per_Game PR_TD_A KR_TD_A Rec_TD_A Rsh_TD_A; run; data Rushing_Defense; infile Rush_D dlm=',' dsd missover firstobs=2; input R_Rank :2. Team :$22. G :2. R_ATT :3. R_YDS :4. R_TD :2. R_Lng :5. R_Y_A :3.1 R_Yds_per_G :5.1 fmb :3. R_Exp_Pts_Cont :6.2; if anydigit(R_Rank)=0 then delete; if R_Yds_per_G < 20 then R_Yds_per_G=R_Yds_per_G * 10; if 0 < R_Exp_Pts_Cont < 1 then R_Exp_Pts_Cont=R_Exp_Pts_Cont * 100; else R_Exp_Pts_Cont=R_Exp_Pts_Cont; drop R_lng fmb R_ATT R_YDS R_TD R_Y_A; run; data Passing_Defense; infile Pass_D dlm=',' dsd missover firstobs=2; input P_Rank :2. Team :$22. G :2. P_C :5. P_A :5. P_Cmp_PCT_A :4.2 P_Y :6. P_TD :3. P_TD_PCT_A :3.1 INT :3. Int_PCT :3.1 P_Lng :2. P_Yds_per_Att :3.1 Adj_P_Yds_per_Att :3.1 P_Yds_per_Cmp :4.1 P_Yds_per_G :5.1 QBR_A :5.1 QBR :1. Sk :4. Sk_Yds :4. N_Y_A :4.2 Adj_Net_Yds_per_Att :3.1 Sk_PCT :3.1 P_Exp_Pts_Cont :7.2; if anydigit(P_Rank)=0 then delete; if P_Cmp_PCT_A < 1 then P_Cmp_PCT_A=P_Cmp_PCT_A *100; if P_TD_PCT_A < 1 then P_TD_PCT_A=P_TD_PCT_A * 10; if Int_PCT <1 then Int_PCT=Int_PCT * 10; if P_Yds_per_Att< 1 then P_Yds_per_Att=P_Yds_per_Att * 10; if Adj_P_Yds_per_Att< 1 then Adj_P_Yds_per_Att=Adj_P_Yds_per_Att * 10; if P_Yds_per_Cmp < 2 then P_Yds_per_Cmp=P_Yds_per_Cmp * 10; if P_Yds_per_G < 30 then P_Yds_per_G=P_Yds_per_G * 10; if QBR_A < 15 then QBR_A=QBR_A * 10; if Adj_Net_Yds_per_Att < 1 then Adj_Net_Yds_per_Att=Adj_Net_Yds_per_Att * 10; if Sk_PCT< 1 then Sk_PCT=Sk_PCT * 10; if -1 < P_Exp_Pts_Cont < 0 then P_Exp_Pts_Cont=P_Exp_Pts_Cont * 100; else P_Exp_Pts_Cont=P_Exp_Pts_Cont; drop P_C P_A P_Y P_TD INT P_Lng N_Y_A QBR; run; proc sort data=Team_Defense; by Team; run; proc sort data=Special_Teams_Defense; by Team; run; proc sort data=Special_Teams_Offense; by Team; run; proc sort data=Scoring_Team_Defense; by Team; run; proc sort data=Rushing_Defense; by Team; run; proc sort data=Passing_Defense; by Team; run; data The_Merge; merge Team_Defense Scoring_Team_Defense Special_Teams_Defense Special_Teams_Offense Rushing_Defense Passing_Defense; by Team; run; data Fantasy_Point_Calculator; set the_merge; F_Pts_Frm_Fmb_Rec_per_G=(Fmbl_Rec/G)*2; F_Pts_Frm_Int_per_G=(Int/G)*2; if (P_Yds_per_G - Sk_Yds)/G < 100 then F_P_Yds_per_G=5 - 5; if 100 <=(P_Yds_A - Sk_Yds)/G < 200 then F_P_Yds_per_G=3 - 5; if 200 <=(P_Yds_A - Sk_Yds)/G < 300 then F_P_Yds_per_G=2 - 5; if 300 <=(P_Yds_A - Sk_Yds)/G < 350 then F_P_Yds_per_G=0 - 5; if 350 <=(P_Yds_A - Sk_Yds)/G < 400 then F_P_Yds_per_G=-1 - 5; if 400 <=(P_Yds_A - Sk_Yds)/G < 450 then F_P_Yds_per_G=-3 - 5; if 450 <=(P_Yds_A - Sk_Yds)/G < 500 then F_P_Yds_per_G=-5 - 5; if 500 <=(P_Yds_A - Sk_Yds)/G < 550 then F_P_Yds_per_G=-6 - 5; if 550 <=(P_Yds_A - Sk_Yds)/G then F_P_Yds_per_G=-7; if R_Yds_per_G < 100 then F_R_Yds_per_G=5 - 5; if 100 <=R_Yds_per_G < 200 then F_R_Yds_per_G=3 - 5; if 200 <=R_Yds_per_G < 300 then F_R_Yds_per_G=2 - 5; if 300 <=R_Yds_per_G < 350 then F_R_Yds_per_G=0 - 5; if 350 <=R_Yds_per_G < 400 then F_R_Yds_per_G=-1 - 5; if 400 <=R_Yds_per_G < 450 then F_R_Yds_per_G=-3 - 5; if 450 <=R_Yds_per_G < 500 then F_R_Yds_per_G=-5 - 5; if 500 <=R_Yds_per_G < 550 then F_R_Yds_per_G=-6 - 5; if 550 <=R_Yds_per_G then F_R_Yds_per_G=-7 - 5; if Tot_Yds_A / G < 100 then F_Tot_Yds_A=5; if 100 <=Tot_Yds_A / G < 200 then F_Tot_Yds_A=3; if 200 <=Tot_Yds_A / G < 300 then F_Tot_Yds_A=2; if 300 <=Tot_Yds_A / G < 350 then F_Tot_Yds_A=0; if 350 <=Tot_Yds_A / G < 400 then F_Tot_Yds_A=-1; if 400 <=Tot_Yds_A / G < 450 then F_Tot_Yds_A=-3; if 450 <=Tot_Yds_A / G < 500 then F_Tot_Yds_A=-5; if 500 <=Tot_Yds_A / G < 550 then F_Tot_Yds_A=-6; if 550 <=Tot_Yds_A / G then F_Tot_Yds_A=-7; if R_TD_A / G * 6=0 then R_F_Pts_per_G=5; if 1 <=R_TD_A / G * 6 < 6 then R_F_Pts_per_G=4; if 6 <=R_TD_A / G * 6 < 13 then R_F_Pts_per_G=3; if 13 <=R_TD_A / G * 6 < 17 then R_F_Pts_per_G=1; if 17 <=R_TD_A / G * 6 < 27 then R_F_Pts_per_G=0; if 27 <=R_TD_A / G * 6 < 34 then R_F_Pts_per_G=-1; if 34 <=R_TD_A / G * 6 < 45 then R_F_Pts_per_G=-3; if 45 <=R_TD_A / G * 6 then R_F_Pts_per_G=-5; if P_TD_A / G * 6=0 then P_F_Pts_per_G=5; if 1 <=P_TD_A / G * 6 < 6 then P_F_Pts_per_G=4; if 6 <=P_TD_A / G * 6 < 13 then P_F_Pts_per_G=3; if 13 <=P_TD_A / G * 6 < 17 then P_F_Pts_per_G=1; if 17 <=P_TD_A / G * 6 < 27 then P_F_Pts_per_G=0; if 27 <=P_TD_A / G * 6 < 34 then P_F_Pts_per_G=-1; if 34 <=P_TD_A / G * 6 < 45 then P_F_Pts_per_G=-3; if 45 <=P_TD_A / G * 6 then P_F_Pts_per_G=-5; if Tot_Pts_A / G=0 then Tot_F_Pts_per_G=5; if 1 <=Tot_Pts_A / G < 6 then Tot_F_Pts_per_G=4; if 6 <=Tot_Pts_A / G < 13 then Tot_F_Pts_per_G=3; if 13 <=Tot_Pts_A / G < 17 then Tot_F_Pts_per_G=1; if 17 <=Tot_Pts_A / G < 27 then Tot_F_Pts_per_G=0; if 27 <=Tot_Pts_A / G < 34 then Tot_F_Pts_per_G=-1; if 34 <=Tot_Pts_A / G < 45 then Tot_F_Pts_per_G=-3; if 45 <=Tot_Pts_A / G then Tot_F_Pts_per_G=-5; F_Pts_Sk_per_G=Sk / G; F_Pts_Sfty_per_G=Sfty / G; F_Pts_Fmbl_TD_per_G=Fmbl_TD_Ret * 6 / G; F_Pts_Int_TD_per_G=Int_TD_Ret * 6 / G; F_Punt_TD_per_G=Punt_TD_O * 6 / G; F_Kick_TD_per_G=Kick_TD_O * 6 / G; Tot_F_Score_per_G=F_Kick_TD_per_G + F_Punt_TD_per_G + F_Pts_Int_TD_per_G + F_Pts_Fmbl_TD_per_G + F_Pts_Sfty_per_G + F_Pts_Sk_per_G + Tot_F_Pts_per_G + F_Tot_Yds_A + F_Pts_Frm_Int_per_G + F_Pts_Frm_Fmb_Rec_per_G; run; proc sort data=fantasy_point_calculator; by descending Tot_F_Score_per_G; run; data add_fantasy_rank; retain Team F_Rank Tot_F_Score_per_G; set fantasy_point_calculator; F_Rank +1; run; proc corr data=add_fantasy_rank; run; proc sgplot data=add_fantasy_rank; bubble x=r_rank y=p_rank size=f_rank; run; quit;
What we’re analyzing
Notice the proc corr statement doesn’t feature a var statement. This allows us to compare all variables instead of the ones specified in the var statement. These data also may not be 100% accurate, as they do not include the kick and punt blocking data, which are used in the standard scoring format.
What does the output mean?
The correlation matrix nicely displays which variables may strongly correlate with fantasy success, but aren’t a direct contributor. For example, do punt return yards correlate well with fantasy rank? If so, maybe that’s the secret statistic that gives you the fantasy edge in your league.
The bubble plot shows the pass rank and rush rank, while the bubble size is determined by fantasy rank; smallest bubble represents highest ranked. The correlation matrix is not included as it is too large for this article, but you can make your own and explore the areas that most interest you.
Now it’s your turn!
Comment with anything interesting you noticed in this data or if this article helped you pick a fantasy defense. And let us know about other interesting ways to use this data.
Need data for learning?
The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:
We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:
Click Analytics U, then select "Subscribe" from the Options menu.