BookmarkSubscribeRSS Feed
Godzilla_Hat
Obsidian | Level 7

Hi everyone, I;m looking at the NFL's field goal kicks and comparing the players. 

The dataset I have is the top 10 best players with their Field goal percentages that they make. I have their percentages from seaons 2020-2015 as well as the average.

What statistical procedures/functions could I use when looking at this dataset ?

I'm a bit lost on what I should do so any help would be much appreciated !

The blank sections of the data are due to players siiting out for a season or two.

data topNFL; /* top players and their averages*/
infile datalines dlm=','; 
input player $ S20 S19 S18 S17 S16 S15 Average;
datalines;
Gano ,96.88,.,87.50,96.67,78.95,83.33,83.94,
Boswell ,95.00,93.55,65.00,92.11,84.00,90.63,88.89,
Koo ,94.87,88.46,50.00,.,.,.,88.64,
Carlson ,94.29,73.08,85.32,.,.,.,85.32,
Santos ,93.75,44.44,77.78,80.00,88.57,81.08,83.24,
Folk,92.86,82.35,.,54.55,87.10,81.25,82.29,
Butker ,92.59,89.47,88.89,90.48,.,.,89.47,
Sanders ,92.31,76.67,90.00,.,.,.,84.4,
Tucker ,90.70,96.55,89.74,91.89,97.44,82.50,90.75,
Succop ,90.32,16.67,86.67,83.33,91.67,87.50,82.99,
;
Proc sort data=topNFL;
by descending average;
run;

PROC PRINT DATA=TOPNFL; 
RUN; 
3 REPLIES 3
PaigeMiller
Diamond | Level 26

... and comparing the players ...

What statistical procedures/functions could I use when looking at this dataset ?

 

Please state clearly what question you want to answer using this data set. What is the criterion for doing this comparison?

--
Paige Miller
Ksharp
Super User
It looks like a non-supervise learning model.
Try Principal Component Analysis : @Rick_SAS 's blog
https://blogs.sas.com/content/iml/2014/11/07/distribution-of-blood-types.html

or Try Cluster analysis:
proc cluster
proc fastclus
SteveDenham
Jade | Level 19

Everyone sees something in data based on what they do the most.  If I were faced with this dataset, I would consider some sort of repeated measures using a generalized linear model.  For instance, if you were looking for estimates for each player, as a conditional value over time, you might try:

 

data long;
set topnfl;
year=20;value=s20/100;output;
year=19;value=s19/100;output;
year=18;value=s18/100;output;
year=17;value=s17/100;output;
year=16;value=s16/100;output;
year=15;value=s15/100;output;
keep player year value;
run;

proc glimmix data=long;
class player year;
nloptions maxiter=1000;
model value=player year/dist=bin ddf=32;
random year/residual type=ar(1) group=player;
lsmeans player/diff ilink;
run;

I set the denominator degrees of freedom to 32, which is what the 'skeleton' ANOVA table (no RANDOM statement) provides in PROC GLM. I used the binomial distribution because the measures are the result of an undetermined number of Bernoulli trials (either made or missed).

 

SteveDenham

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 524 views
  • 3 likes
  • 4 in conversation