We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Picking your new fantasy football defense? Here’s how University Edition can help

by SAS Employee cakramer on ‎10-02-2015 01:06 PM - edited on ‎10-20-2015 03:30 PM by Community Manager (1,554 Views)

 

 

FantasyFootball.jpg

 

 

What’s this data?

 

This post, our third in the Free Data Friday series, revisits the data set from Pro-Football-Reference. Using University Edition, you can make informed decisions about your fantasy defense.

 

How to download

 

If you don’t already have University Edition, get it here and, follow the instructions from the pdf carefully. If you need help with almost any aspect of using University Edition, check out these video tutorialsFreeDataFriday_graphic.jpg

 

To download this data, go to the link provided and download the first six datasets from the 2014-2015 season. There are seven, but omit the last one, Drives Against, because it lacks useful data for the task at hand.

 

How to get the data and prep it for analysis

 

The code, while lengthy, is pretty simple. The first post using this data focused on rushing defense. This analysis features the same coding method, but merges the rushing defense with all the other aspects of the defense and special teams.

 

A special challenge of this data is how University Edition treats numbers with decimals. For example, if you have a whole number, like 6, in data where numbers are in the format 3.1, University Edition will record it as .6. The best way to fix this is to go variable by variable and apply math to fix the numbers and format the digits correctly.

 

Next, the main new task is merging these datasets. The first thing to do is sort all your datasets by team, since we want to merge all of these datasets by team. The merge starts the same as any other data step, but instead of using a set statement or an infile statement, you use a merge statement. Then list all the datasets you’ve created.

 

The next statement will be a by statement. This indicates which variable you merge by. The choice for us is Team; this will allow us to have one row per team. Now we have a dataset with all these variables and one row per team.

 

Create your fantasy point calculator

 

The next challenge is making a fantasy point calculator. The values I use are standard scoring format settings. It is very easy to use math to apply values to all your variables to create new variables that indicate fantasy point values for each team’s interception total, fumble total, etc. Adding in the fantasy rank is the last touch.

 

Code

filename Team_D "/folders/myfolders/my_data/NFL Team Defense.csv";
filename SpecT_D "/folders/myfolders/my_data/NFL Special Teams Defense.csv";
filename SpecT_O "/folders/myfolders/my_data/NFL Special Teams Offense.csv";
filename Scor_D "/folders/myfolders/my_data/NFL Scoring Defense.csv";
filename Rush_D "/folders/myfolders/my_data/NFL Rushing Defense.csv";
filename Pass_D "/folders/myfolders/my_data/NFL Passing Defense.csv";

data Team_Defense;
	infile Team_D dlm=',' dsd missover firstobs=3;
	input Ovr_Rank :2. Team :$22. G :2. Tot_Pts_A :5. Tot_Yds_A :6. Plays :5. 
		Tot_Yds_per_Play :3.1 Tot_TOs :3. Fmbl_Rec :3. Fst_Dwns_Pty :4. Fst_Dwns :5. 
		P_Cmp_A :5. P_Att_A :5. P_Yds_A :6. P_TD_A :3. Int :3. Net_P_Yds_per_Att :6. 
		P_Fst_Dwns_A :4. R_Att_A :5. R_Yds_A :5. R_TD_A :3. R_Yds_per_Att :3.1 
		R_Fst_Dwns_A :4. SC_Pct :1. TO_Pct :1. Tot_Exp_Pts_Cont :7.2;

	if anydigit(Ovr_Rank)=0 then
		delete;

	if Tot_Yds_per_Play < 1 then
		Tot_Yds_per_Play=Tot_Yds_per_Play * 10;

	if R_Yds_per_Att < 1 then
		R_Yds_per_Att=R_Yds_per_Att * 10;
	drop SC_Pct TO_Pct;
run;

data Special_Teams_Defense;
	infile SpecT_D dlm=',' dsd missover firstobs=3;
	input SpT_D_Rank :2. Team :$22. G :2. Punt_Ret_D :4. Punt_Yds_D :4. 
		Punt_TD_D :2. Punt_Long_D :2. Punt_Yds_per_Ret_D :4.1 Kick_Ret_D :4. 
		Kick_Yds_D :4. Kick_TD_D :2. Kick_Long_D :2. Kick_Yds_per_Ret_D :4.1 
		All_Purp_Yds_D :6.;

	if anydigit(SpT_D_Rank)=0 then
		delete;

	if Punt_Yds_per_Ret_D<=1 then
		Punt_Yds_per_Ret_D=Punt_Yds_per_Ret_D * 10;

	if Kick_Yds_per_Ret_D< 10 then
		Kick_Yds_per_Ret_D=Kick_Yds_per_Ret_D * 10;
	drop Punt_Long_D Kick_Long_D All_Purp_Yds_D;
run;

data Special_Teams_Offense;
	infile SpecT_O dlm=',' dsd missover firstobs=3;
	input SpT_O_Rank :2. Team :$22. G :2. Punt_Ret_O :4. Punt_Yds_O :4. 
		Punt_TD_O :2. Punt_Long_O :2. Punt_Yds_per_Ret_O :4.1 Kick_Ret_O :4. 
		Kick_Yds_O :4. Kick_TD_O :2. Kick_Long_O :2. Kick_Yds_per_Ret_O :4.1 
		All_Purp_Yds_O :6.;

	if anydigit(SpT_O_Rank)=0 then
		delete;

	if Punt_Yds_per_Ret_O<=1 then
		Punt_Yds_per_Ret_O=Punt_Yds_per_Ret_O * 10;

	if Kick_Yds_per_Ret_O< 10 then
		Kick_Yds_per_Ret_O=Kick_Yds_per_Ret_O * 10;
	drop Punt_Long_O Kick_Long_O;
run;

data Scoring_Team_Defense;
	infile Scor_D dlm=',' dsd missover firstobs=2;
	input Scor_D_Rank :2. Team :$22. G :2. Rsh_TD_A :3. Rec_TD_A :3. PR_TD_A :2. 
		KR_TD_A :2. Fmbl_TD_Ret :2. Int_TD_Ret :2. Oth_TD :2. All_TD :2. _2_PM :2. 
		X_Pts_M :4. FG_M :3. Sfty :2. Tot_Pts_Scrd :5. Pts_per_Game :4.1;

	if anydigit (Scor_D_Rank)=0 then
		delete;

	if Fmbl_TD_Ret=. then
		Fmbl_TD_Ret=0;

	if Int_TD_Ret=. then
		Int_TD_Ret=0;

	if Sfty=. then
		Sfty=0;
	drop Oth_TD All_TD _2_PM X_Pts_M FG_M Tot_Pts_Scrd Pts_per_Game PR_TD_A 
		KR_TD_A Rec_TD_A Rsh_TD_A;
run;

data Rushing_Defense;
	infile Rush_D dlm=',' dsd missover firstobs=2;
	input R_Rank :2. Team :$22. G :2. R_ATT :3. R_YDS :4. R_TD :2. R_Lng :5. 
		R_Y_A :3.1 R_Yds_per_G :5.1 fmb :3. R_Exp_Pts_Cont :6.2;

	if anydigit(R_Rank)=0 then
		delete;

	if R_Yds_per_G < 20 then
		R_Yds_per_G=R_Yds_per_G * 10;

	if 0 < R_Exp_Pts_Cont < 1 then
		R_Exp_Pts_Cont=R_Exp_Pts_Cont * 100;
	else
		R_Exp_Pts_Cont=R_Exp_Pts_Cont;
	drop R_lng fmb R_ATT R_YDS R_TD R_Y_A;
run;

data Passing_Defense;
	infile Pass_D dlm=',' dsd missover firstobs=2;
	input P_Rank :2. Team :$22. G :2. P_C :5. P_A :5. P_Cmp_PCT_A :4.2 P_Y :6. 
		P_TD :3. P_TD_PCT_A :3.1 INT :3. Int_PCT :3.1 P_Lng :2. P_Yds_per_Att :3.1 
		Adj_P_Yds_per_Att :3.1 P_Yds_per_Cmp :4.1 P_Yds_per_G :5.1 QBR_A :5.1 QBR :1. 
		Sk :4. Sk_Yds :4. N_Y_A :4.2 Adj_Net_Yds_per_Att :3.1 Sk_PCT :3.1 
		P_Exp_Pts_Cont :7.2;

	if anydigit(P_Rank)=0 then
		delete;

	if P_Cmp_PCT_A < 1 then
		P_Cmp_PCT_A=P_Cmp_PCT_A *100;

	if P_TD_PCT_A < 1 then
		P_TD_PCT_A=P_TD_PCT_A * 10;

	if Int_PCT <1 then
		Int_PCT=Int_PCT * 10;

	if P_Yds_per_Att< 1 then
		P_Yds_per_Att=P_Yds_per_Att * 10;

	if Adj_P_Yds_per_Att< 1 then
		Adj_P_Yds_per_Att=Adj_P_Yds_per_Att * 10;

	if P_Yds_per_Cmp < 2 then
		P_Yds_per_Cmp=P_Yds_per_Cmp * 10;

	if P_Yds_per_G < 30 then
		P_Yds_per_G=P_Yds_per_G * 10;

	if QBR_A < 15 then
		QBR_A=QBR_A * 10;

	if Adj_Net_Yds_per_Att < 1 then
		Adj_Net_Yds_per_Att=Adj_Net_Yds_per_Att * 10;

	if Sk_PCT< 1 then
		Sk_PCT=Sk_PCT * 10;

	if -1 < P_Exp_Pts_Cont < 0 then
		P_Exp_Pts_Cont=P_Exp_Pts_Cont * 100;
	else
		P_Exp_Pts_Cont=P_Exp_Pts_Cont;
	drop P_C P_A P_Y P_TD INT P_Lng N_Y_A QBR;
run;

proc sort data=Team_Defense;
	by Team;
run;

proc sort data=Special_Teams_Defense;
	by Team;
run;

proc sort data=Special_Teams_Offense;
	by Team;
run;

proc sort data=Scoring_Team_Defense;
	by Team;
run;

proc sort data=Rushing_Defense;
	by Team;
run;

proc sort data=Passing_Defense;
	by Team;
run;

data The_Merge;
	merge Team_Defense Scoring_Team_Defense Special_Teams_Defense 
		Special_Teams_Offense Rushing_Defense Passing_Defense;
	by Team;
run;

data Fantasy_Point_Calculator;
	set the_merge;
	F_Pts_Frm_Fmb_Rec_per_G=(Fmbl_Rec/G)*2;
	F_Pts_Frm_Int_per_G=(Int/G)*2;

	if (P_Yds_per_G - Sk_Yds)/G < 100 then
		F_P_Yds_per_G=5 - 5;

	if 100 <=(P_Yds_A - Sk_Yds)/G < 200 then
		F_P_Yds_per_G=3 - 5;

	if 200 <=(P_Yds_A - Sk_Yds)/G < 300 then
		F_P_Yds_per_G=2 - 5;

	if 300 <=(P_Yds_A - Sk_Yds)/G < 350 then
		F_P_Yds_per_G=0 - 5;

	if 350 <=(P_Yds_A - Sk_Yds)/G < 400 then
		F_P_Yds_per_G=-1 - 5;

	if 400 <=(P_Yds_A - Sk_Yds)/G < 450 then
		F_P_Yds_per_G=-3 - 5;

	if 450 <=(P_Yds_A - Sk_Yds)/G < 500 then
		F_P_Yds_per_G=-5 - 5;

	if 500 <=(P_Yds_A - Sk_Yds)/G < 550 then
		F_P_Yds_per_G=-6 - 5;

	if 550 <=(P_Yds_A - Sk_Yds)/G then
		F_P_Yds_per_G=-7;

	if R_Yds_per_G < 100 then
		F_R_Yds_per_G=5 - 5;

	if 100 <=R_Yds_per_G < 200 then
		F_R_Yds_per_G=3 - 5;

	if 200 <=R_Yds_per_G < 300 then
		F_R_Yds_per_G=2 - 5;

	if 300 <=R_Yds_per_G < 350 then
		F_R_Yds_per_G=0 - 5;

	if 350 <=R_Yds_per_G < 400 then
		F_R_Yds_per_G=-1 - 5;

	if 400 <=R_Yds_per_G < 450 then
		F_R_Yds_per_G=-3 - 5;

	if 450 <=R_Yds_per_G < 500 then
		F_R_Yds_per_G=-5 - 5;

	if 500 <=R_Yds_per_G < 550 then
		F_R_Yds_per_G=-6 - 5;

	if 550 <=R_Yds_per_G then
		F_R_Yds_per_G=-7 - 5;

	if Tot_Yds_A / G < 100 then
		F_Tot_Yds_A=5;

	if 100 <=Tot_Yds_A / G < 200 then
		F_Tot_Yds_A=3;

	if 200 <=Tot_Yds_A / G < 300 then
		F_Tot_Yds_A=2;

	if 300 <=Tot_Yds_A / G < 350 then
		F_Tot_Yds_A=0;

	if 350 <=Tot_Yds_A / G < 400 then
		F_Tot_Yds_A=-1;

	if 400 <=Tot_Yds_A / G < 450 then
		F_Tot_Yds_A=-3;

	if 450 <=Tot_Yds_A / G < 500 then
		F_Tot_Yds_A=-5;

	if 500 <=Tot_Yds_A / G < 550 then
		F_Tot_Yds_A=-6;

	if 550 <=Tot_Yds_A / G then
		F_Tot_Yds_A=-7;

	if R_TD_A / G * 6=0 then
		R_F_Pts_per_G=5;

	if 1 <=R_TD_A / G * 6 < 6 then
		R_F_Pts_per_G=4;

	if 6 <=R_TD_A / G * 6 < 13 then
		R_F_Pts_per_G=3;

	if 13 <=R_TD_A / G * 6 < 17 then
		R_F_Pts_per_G=1;

	if 17 <=R_TD_A / G * 6 < 27 then
		R_F_Pts_per_G=0;

	if 27 <=R_TD_A / G * 6 < 34 then
		R_F_Pts_per_G=-1;

	if 34 <=R_TD_A / G * 6 < 45 then
		R_F_Pts_per_G=-3;

	if 45 <=R_TD_A / G * 6 then
		R_F_Pts_per_G=-5;

	if P_TD_A / G * 6=0 then
		P_F_Pts_per_G=5;

	if 1 <=P_TD_A / G * 6 < 6 then
		P_F_Pts_per_G=4;

	if 6 <=P_TD_A / G * 6 < 13 then
		P_F_Pts_per_G=3;

	if 13 <=P_TD_A / G * 6 < 17 then
		P_F_Pts_per_G=1;

	if 17 <=P_TD_A / G * 6 < 27 then
		P_F_Pts_per_G=0;

	if 27 <=P_TD_A / G * 6 < 34 then
		P_F_Pts_per_G=-1;

	if 34 <=P_TD_A / G * 6 < 45 then
		P_F_Pts_per_G=-3;

	if 45 <=P_TD_A / G * 6 then
		P_F_Pts_per_G=-5;

	if Tot_Pts_A / G=0 then
		Tot_F_Pts_per_G=5;

	if 1 <=Tot_Pts_A / G < 6 then
		Tot_F_Pts_per_G=4;

	if 6 <=Tot_Pts_A / G < 13 then
		Tot_F_Pts_per_G=3;

	if 13 <=Tot_Pts_A / G < 17 then
		Tot_F_Pts_per_G=1;

	if 17 <=Tot_Pts_A / G < 27 then
		Tot_F_Pts_per_G=0;

	if 27 <=Tot_Pts_A / G < 34 then
		Tot_F_Pts_per_G=-1;

	if 34 <=Tot_Pts_A / G < 45 then
		Tot_F_Pts_per_G=-3;

	if 45 <=Tot_Pts_A / G then
		Tot_F_Pts_per_G=-5;
	F_Pts_Sk_per_G=Sk / G;
	F_Pts_Sfty_per_G=Sfty / G;
	F_Pts_Fmbl_TD_per_G=Fmbl_TD_Ret * 6 / G;
	F_Pts_Int_TD_per_G=Int_TD_Ret * 6 / G;
	F_Punt_TD_per_G=Punt_TD_O * 6 / G;
	F_Kick_TD_per_G=Kick_TD_O * 6 / G;
	Tot_F_Score_per_G=F_Kick_TD_per_G + F_Punt_TD_per_G + F_Pts_Int_TD_per_G + F_Pts_Fmbl_TD_per_G + F_Pts_Sfty_per_G + 
                                    F_Pts_Sk_per_G + Tot_F_Pts_per_G + F_Tot_Yds_A + F_Pts_Frm_Int_per_G + F_Pts_Frm_Fmb_Rec_per_G;
run;

proc sort data=fantasy_point_calculator;
	by descending Tot_F_Score_per_G;
run;

data add_fantasy_rank;
	retain Team F_Rank Tot_F_Score_per_G;
	set fantasy_point_calculator;
	F_Rank +1;
run;

proc corr data=add_fantasy_rank;
run;

proc sgplot data=add_fantasy_rank;
	bubble x=r_rank y=p_rank size=f_rank;
run;

quit;

What we’re analyzing

 

Notice the proc corr statement doesn’t feature a var statement. This allows us to compare all variables instead of the ones specified in the var statement. These data also may not be 100% accurate, as they do not include the kick and punt blocking data, which are used in the standard scoring format.

 

Post 3 Bubble Plot.png

 

What does the output mean?

 

The correlation matrix nicely displays which variables may strongly correlate with fantasy success, but aren’t a direct contributor. For example, do punt return yards correlate well with fantasy rank? If so, maybe that’s the secret statistic that gives you the fantasy edge in your league.

 

The bubble plot shows the pass rank and rush rank, while the bubble size is determined by fantasy rank; smallest bubble represents highest ranked. The correlation matrix is not included as it is too large for this article, but you can make your own and explore the areas that most interest you.

 

Now it’s your turn!

Comment with anything interesting you noticed in this data or if this article helped you pick a fantasy defense. And let us know about other interesting ways to use this data.

 

Need data for learning?

 

The SAS Communities Library has a growing supply of free data sources that you can use in your training to become a data scientist. The easiest way to find articles about data sources is to type "Data for learning" in the communities site search field like so:

 Search.PNG

We publish all articles about free data sources under the Analytics U label in the SAS Communities Library. Want email notifications when we add new content? Subscribe to the Analytics U label by clicking "Find A Community" in the right nav and selecting SAS Communities Library at the bottom of the list. In the Labels box in the right nav, click Analytics U:

 

AnalyticsU.PNG

 

Click Analytics U, then select "Subscribe" from the Options menu.

 

Happy Learning!

Your turn
Sign In!

Want to write an article? Sign in with your profile.


Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.