# Picking your new fantasy football defense? Here’s how University Edition can help

by on ‎10-02-2015 01:06 PM - edited on ‎10-20-2015 03:30 PM by (1,554 Views)
• ### Analytics U

What’s this data?

This post, our third in the Free Data Friday series, revisits the data set from Pro-Football-Reference. Using University Edition, you can make informed decisions about your fantasy defense.

If you don’t already have University Edition, get it here and, follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorials

To download this data, go to the link provided and download the first six datasets from the 2014-2015 season. There are seven, but omit the last one, Drives Against, because it lacks useful data for the task at hand.

How to get the data and prep it for analysis

The code, while lengthy, is pretty simple. The first post using this data focused on rushing defense. This analysis features the same coding method, but merges the rushing defense with all the other aspects of the defense and special teams.

A special challenge of this data is how University Edition treats numbers with decimals. For example, if you have a whole number, like 6, in data where numbers are in the format 3.1, University Edition will record it as .6. The best way to fix this is to go variable by variable and apply math to fix the numbers and format the digits correctly.

Next, the main new task is merging these datasets. The first thing to do is sort all your datasets by team, since we want to merge all of these datasets by team. The merge starts the same as any other data step, but instead of using a set statement or an infile statement, you use a merge statement. Then list all the datasets you’ve created.

The next statement will be a by statement. This indicates which variable you merge by. The choice for us is Team; this will allow us to have one row per team. Now we have a dataset with all these variables and one row per team.

The next challenge is making a fantasy point calculator. The values I use are standard scoring format settings. It is very easy to use math to apply values to all your variables to create new variables that indicate fantasy point values for each team’s interception total, fumble total, etc. Adding in the fantasy rank is the last touch.

Code

``````filename Team_D "/folders/myfolders/my_data/NFL Team Defense.csv";
filename SpecT_D "/folders/myfolders/my_data/NFL Special Teams Defense.csv";
filename SpecT_O "/folders/myfolders/my_data/NFL Special Teams Offense.csv";
filename Scor_D "/folders/myfolders/my_data/NFL Scoring Defense.csv";
filename Rush_D "/folders/myfolders/my_data/NFL Rushing Defense.csv";
filename Pass_D "/folders/myfolders/my_data/NFL Passing Defense.csv";

data Team_Defense;
infile Team_D dlm=',' dsd missover firstobs=3;
input Ovr_Rank :2. Team :\$22. G :2. Tot_Pts_A :5. Tot_Yds_A :6. Plays :5.
Tot_Yds_per_Play :3.1 Tot_TOs :3. Fmbl_Rec :3. Fst_Dwns_Pty :4. Fst_Dwns :5.
P_Cmp_A :5. P_Att_A :5. P_Yds_A :6. P_TD_A :3. Int :3. Net_P_Yds_per_Att :6.
P_Fst_Dwns_A :4. R_Att_A :5. R_Yds_A :5. R_TD_A :3. R_Yds_per_Att :3.1
R_Fst_Dwns_A :4. SC_Pct :1. TO_Pct :1. Tot_Exp_Pts_Cont :7.2;

if anydigit(Ovr_Rank)=0 then
delete;

if Tot_Yds_per_Play < 1 then
Tot_Yds_per_Play=Tot_Yds_per_Play * 10;

if R_Yds_per_Att < 1 then
R_Yds_per_Att=R_Yds_per_Att * 10;
drop SC_Pct TO_Pct;
run;

data Special_Teams_Defense;
infile SpecT_D dlm=',' dsd missover firstobs=3;
input SpT_D_Rank :2. Team :\$22. G :2. Punt_Ret_D :4. Punt_Yds_D :4.
Punt_TD_D :2. Punt_Long_D :2. Punt_Yds_per_Ret_D :4.1 Kick_Ret_D :4.
Kick_Yds_D :4. Kick_TD_D :2. Kick_Long_D :2. Kick_Yds_per_Ret_D :4.1
All_Purp_Yds_D :6.;

if anydigit(SpT_D_Rank)=0 then
delete;

if Punt_Yds_per_Ret_D<=1 then
Punt_Yds_per_Ret_D=Punt_Yds_per_Ret_D * 10;

if Kick_Yds_per_Ret_D< 10 then
Kick_Yds_per_Ret_D=Kick_Yds_per_Ret_D * 10;
drop Punt_Long_D Kick_Long_D All_Purp_Yds_D;
run;

data Special_Teams_Offense;
infile SpecT_O dlm=',' dsd missover firstobs=3;
input SpT_O_Rank :2. Team :\$22. G :2. Punt_Ret_O :4. Punt_Yds_O :4.
Punt_TD_O :2. Punt_Long_O :2. Punt_Yds_per_Ret_O :4.1 Kick_Ret_O :4.
Kick_Yds_O :4. Kick_TD_O :2. Kick_Long_O :2. Kick_Yds_per_Ret_O :4.1
All_Purp_Yds_O :6.;

if anydigit(SpT_O_Rank)=0 then
delete;

if Punt_Yds_per_Ret_O<=1 then
Punt_Yds_per_Ret_O=Punt_Yds_per_Ret_O * 10;

if Kick_Yds_per_Ret_O< 10 then
Kick_Yds_per_Ret_O=Kick_Yds_per_Ret_O * 10;
drop Punt_Long_O Kick_Long_O;
run;

data Scoring_Team_Defense;
infile Scor_D dlm=',' dsd missover firstobs=2;
input Scor_D_Rank :2. Team :\$22. G :2. Rsh_TD_A :3. Rec_TD_A :3. PR_TD_A :2.
KR_TD_A :2. Fmbl_TD_Ret :2. Int_TD_Ret :2. Oth_TD :2. All_TD :2. _2_PM :2.
X_Pts_M :4. FG_M :3. Sfty :2. Tot_Pts_Scrd :5. Pts_per_Game :4.1;

if anydigit (Scor_D_Rank)=0 then
delete;

if Fmbl_TD_Ret=. then
Fmbl_TD_Ret=0;

if Int_TD_Ret=. then
Int_TD_Ret=0;

if Sfty=. then
Sfty=0;
drop Oth_TD All_TD _2_PM X_Pts_M FG_M Tot_Pts_Scrd Pts_per_Game PR_TD_A
KR_TD_A Rec_TD_A Rsh_TD_A;
run;

data Rushing_Defense;
infile Rush_D dlm=',' dsd missover firstobs=2;
input R_Rank :2. Team :\$22. G :2. R_ATT :3. R_YDS :4. R_TD :2. R_Lng :5.
R_Y_A :3.1 R_Yds_per_G :5.1 fmb :3. R_Exp_Pts_Cont :6.2;

if anydigit(R_Rank)=0 then
delete;

if R_Yds_per_G < 20 then
R_Yds_per_G=R_Yds_per_G * 10;

if 0 < R_Exp_Pts_Cont < 1 then
R_Exp_Pts_Cont=R_Exp_Pts_Cont * 100;
else
R_Exp_Pts_Cont=R_Exp_Pts_Cont;
drop R_lng fmb R_ATT R_YDS R_TD R_Y_A;
run;

data Passing_Defense;
infile Pass_D dlm=',' dsd missover firstobs=2;
input P_Rank :2. Team :\$22. G :2. P_C :5. P_A :5. P_Cmp_PCT_A :4.2 P_Y :6.
P_TD :3. P_TD_PCT_A :3.1 INT :3. Int_PCT :3.1 P_Lng :2. P_Yds_per_Att :3.1
Adj_P_Yds_per_Att :3.1 P_Yds_per_Cmp :4.1 P_Yds_per_G :5.1 QBR_A :5.1 QBR :1.
Sk :4. Sk_Yds :4. N_Y_A :4.2 Adj_Net_Yds_per_Att :3.1 Sk_PCT :3.1
P_Exp_Pts_Cont :7.2;

if anydigit(P_Rank)=0 then
delete;

if P_Cmp_PCT_A < 1 then
P_Cmp_PCT_A=P_Cmp_PCT_A *100;

if P_TD_PCT_A < 1 then
P_TD_PCT_A=P_TD_PCT_A * 10;

if Int_PCT <1 then
Int_PCT=Int_PCT * 10;

if P_Yds_per_Att< 1 then
P_Yds_per_Att=P_Yds_per_Att * 10;

if P_Yds_per_Cmp < 2 then
P_Yds_per_Cmp=P_Yds_per_Cmp * 10;

if P_Yds_per_G < 30 then
P_Yds_per_G=P_Yds_per_G * 10;

if QBR_A < 15 then
QBR_A=QBR_A * 10;

if Sk_PCT< 1 then
Sk_PCT=Sk_PCT * 10;

if -1 < P_Exp_Pts_Cont < 0 then
P_Exp_Pts_Cont=P_Exp_Pts_Cont * 100;
else
P_Exp_Pts_Cont=P_Exp_Pts_Cont;
drop P_C P_A P_Y P_TD INT P_Lng N_Y_A QBR;
run;

proc sort data=Team_Defense;
by Team;
run;

proc sort data=Special_Teams_Defense;
by Team;
run;

proc sort data=Special_Teams_Offense;
by Team;
run;

proc sort data=Scoring_Team_Defense;
by Team;
run;

proc sort data=Rushing_Defense;
by Team;
run;

proc sort data=Passing_Defense;
by Team;
run;

data The_Merge;
merge Team_Defense Scoring_Team_Defense Special_Teams_Defense
Special_Teams_Offense Rushing_Defense Passing_Defense;
by Team;
run;

data Fantasy_Point_Calculator;
set the_merge;
F_Pts_Frm_Fmb_Rec_per_G=(Fmbl_Rec/G)*2;
F_Pts_Frm_Int_per_G=(Int/G)*2;

if (P_Yds_per_G - Sk_Yds)/G < 100 then
F_P_Yds_per_G=5 - 5;

if 100 <=(P_Yds_A - Sk_Yds)/G < 200 then
F_P_Yds_per_G=3 - 5;

if 200 <=(P_Yds_A - Sk_Yds)/G < 300 then
F_P_Yds_per_G=2 - 5;

if 300 <=(P_Yds_A - Sk_Yds)/G < 350 then
F_P_Yds_per_G=0 - 5;

if 350 <=(P_Yds_A - Sk_Yds)/G < 400 then
F_P_Yds_per_G=-1 - 5;

if 400 <=(P_Yds_A - Sk_Yds)/G < 450 then
F_P_Yds_per_G=-3 - 5;

if 450 <=(P_Yds_A - Sk_Yds)/G < 500 then
F_P_Yds_per_G=-5 - 5;

if 500 <=(P_Yds_A - Sk_Yds)/G < 550 then
F_P_Yds_per_G=-6 - 5;

if 550 <=(P_Yds_A - Sk_Yds)/G then
F_P_Yds_per_G=-7;

if R_Yds_per_G < 100 then
F_R_Yds_per_G=5 - 5;

if 100 <=R_Yds_per_G < 200 then
F_R_Yds_per_G=3 - 5;

if 200 <=R_Yds_per_G < 300 then
F_R_Yds_per_G=2 - 5;

if 300 <=R_Yds_per_G < 350 then
F_R_Yds_per_G=0 - 5;

if 350 <=R_Yds_per_G < 400 then
F_R_Yds_per_G=-1 - 5;

if 400 <=R_Yds_per_G < 450 then
F_R_Yds_per_G=-3 - 5;

if 450 <=R_Yds_per_G < 500 then
F_R_Yds_per_G=-5 - 5;

if 500 <=R_Yds_per_G < 550 then
F_R_Yds_per_G=-6 - 5;

if 550 <=R_Yds_per_G then
F_R_Yds_per_G=-7 - 5;

if Tot_Yds_A / G < 100 then
F_Tot_Yds_A=5;

if 100 <=Tot_Yds_A / G < 200 then
F_Tot_Yds_A=3;

if 200 <=Tot_Yds_A / G < 300 then
F_Tot_Yds_A=2;

if 300 <=Tot_Yds_A / G < 350 then
F_Tot_Yds_A=0;

if 350 <=Tot_Yds_A / G < 400 then
F_Tot_Yds_A=-1;

if 400 <=Tot_Yds_A / G < 450 then
F_Tot_Yds_A=-3;

if 450 <=Tot_Yds_A / G < 500 then
F_Tot_Yds_A=-5;

if 500 <=Tot_Yds_A / G < 550 then
F_Tot_Yds_A=-6;

if 550 <=Tot_Yds_A / G then
F_Tot_Yds_A=-7;

if R_TD_A / G * 6=0 then
R_F_Pts_per_G=5;

if 1 <=R_TD_A / G * 6 < 6 then
R_F_Pts_per_G=4;

if 6 <=R_TD_A / G * 6 < 13 then
R_F_Pts_per_G=3;

if 13 <=R_TD_A / G * 6 < 17 then
R_F_Pts_per_G=1;

if 17 <=R_TD_A / G * 6 < 27 then
R_F_Pts_per_G=0;

if 27 <=R_TD_A / G * 6 < 34 then
R_F_Pts_per_G=-1;

if 34 <=R_TD_A / G * 6 < 45 then
R_F_Pts_per_G=-3;

if 45 <=R_TD_A / G * 6 then
R_F_Pts_per_G=-5;

if P_TD_A / G * 6=0 then
P_F_Pts_per_G=5;

if 1 <=P_TD_A / G * 6 < 6 then
P_F_Pts_per_G=4;

if 6 <=P_TD_A / G * 6 < 13 then
P_F_Pts_per_G=3;

if 13 <=P_TD_A / G * 6 < 17 then
P_F_Pts_per_G=1;

if 17 <=P_TD_A / G * 6 < 27 then
P_F_Pts_per_G=0;

if 27 <=P_TD_A / G * 6 < 34 then
P_F_Pts_per_G=-1;

if 34 <=P_TD_A / G * 6 < 45 then
P_F_Pts_per_G=-3;

if 45 <=P_TD_A / G * 6 then
P_F_Pts_per_G=-5;

if Tot_Pts_A / G=0 then
Tot_F_Pts_per_G=5;

if 1 <=Tot_Pts_A / G < 6 then
Tot_F_Pts_per_G=4;

if 6 <=Tot_Pts_A / G < 13 then
Tot_F_Pts_per_G=3;

if 13 <=Tot_Pts_A / G < 17 then
Tot_F_Pts_per_G=1;

if 17 <=Tot_Pts_A / G < 27 then
Tot_F_Pts_per_G=0;

if 27 <=Tot_Pts_A / G < 34 then
Tot_F_Pts_per_G=-1;

if 34 <=Tot_Pts_A / G < 45 then
Tot_F_Pts_per_G=-3;

if 45 <=Tot_Pts_A / G then
Tot_F_Pts_per_G=-5;
F_Pts_Sk_per_G=Sk / G;
F_Pts_Sfty_per_G=Sfty / G;
F_Pts_Fmbl_TD_per_G=Fmbl_TD_Ret * 6 / G;
F_Pts_Int_TD_per_G=Int_TD_Ret * 6 / G;
F_Punt_TD_per_G=Punt_TD_O * 6 / G;
F_Kick_TD_per_G=Kick_TD_O * 6 / G;
Tot_F_Score_per_G=F_Kick_TD_per_G + F_Punt_TD_per_G + F_Pts_Int_TD_per_G + F_Pts_Fmbl_TD_per_G + F_Pts_Sfty_per_G +
F_Pts_Sk_per_G + Tot_F_Pts_per_G + F_Tot_Yds_A + F_Pts_Frm_Int_per_G + F_Pts_Frm_Fmb_Rec_per_G;
run;

proc sort data=fantasy_point_calculator;
by descending Tot_F_Score_per_G;
run;

retain Team F_Rank Tot_F_Score_per_G;
set fantasy_point_calculator;
F_Rank +1;
run;

run;

bubble x=r_rank y=p_rank size=f_rank;
run;

quit;``````

What we’re analyzing

Notice the proc corr statement doesn’t feature a var statement. This allows us to compare all variables instead of the ones specified in the var statement. These data also may not be 100% accurate, as they do not include the kick and punt blocking data, which are used in the standard scoring format.

What does the output mean?

The correlation matrix nicely displays which variables may strongly correlate with fantasy success, but aren’t a direct contributor. For example, do punt return yards correlate well with fantasy rank? If so, maybe that’s the secret statistic that gives you the fantasy edge in your league.

The bubble plot shows the pass rank and rush rank, while the bubble size is determined by fantasy rank; smallest bubble represents highest ranked. The correlation matrix is not included as it is too large for this article, but you can make your own and explore the areas that most interest you.

Comment with anything interesting you noticed in this data or if this article helped you pick a fantasy defense. And let us know about other interesting ways to use this data.

Need data for learning?

The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:

We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:

Click Analytics U, then select "Subscribe" from the Options menu.

Happy Learning!

Contributors