BookmarkSubscribeRSS Feed
Godzilla_Hat
Obsidian | Level 7

Hi everyone, I;m looking at the NFL's field goal kicks and comparing the players. 

The dataset I have is the top 10 best players with their Field goal percentages that they make. I have their percentages from seaons 2020-2015 as well as the average.

What statistical procedures/functions could I use when looking at this dataset ?

I'm a bit lost on what I should do so any help would be much appreciated !

The blank sections of the data are due to players siiting out for a season or two.

data topNFL; /* top players and their averages*/
infile datalines dlm=','; 
input player $ S20 S19 S18 S17 S16 S15 Average;
datalines;
Gano ,96.88,.,87.50,96.67,78.95,83.33,83.94,
Boswell ,95.00,93.55,65.00,92.11,84.00,90.63,88.89,
Koo ,94.87,88.46,50.00,.,.,.,88.64,
Carlson ,94.29,73.08,85.32,.,.,.,85.32,
Santos ,93.75,44.44,77.78,80.00,88.57,81.08,83.24,
Folk,92.86,82.35,.,54.55,87.10,81.25,82.29,
Butker ,92.59,89.47,88.89,90.48,.,.,89.47,
Sanders ,92.31,76.67,90.00,.,.,.,84.4,
Tucker ,90.70,96.55,89.74,91.89,97.44,82.50,90.75,
Succop ,90.32,16.67,86.67,83.33,91.67,87.50,82.99,
;
Proc sort data=topNFL;
by descending average;
run;

PROC PRINT DATA=TOPNFL; 
RUN; 
3 REPLIES 3
PaigeMiller
Diamond | Level 26

... and comparing the players ...

What statistical procedures/functions could I use when looking at this dataset ?

 

Please state clearly what question you want to answer using this data set. What is the criterion for doing this comparison?

--
Paige Miller
Ksharp
Super User
It looks like a non-supervise learning model.
Try Principal Component Analysis : @Rick_SAS 's blog
https://blogs.sas.com/content/iml/2014/11/07/distribution-of-blood-types.html

or Try Cluster analysis:
proc cluster
proc fastclus
SteveDenham
Jade | Level 19

Everyone sees something in data based on what they do the most.  If I were faced with this dataset, I would consider some sort of repeated measures using a generalized linear model.  For instance, if you were looking for estimates for each player, as a conditional value over time, you might try:

 

data long;
set topnfl;
year=20;value=s20/100;output;
year=19;value=s19/100;output;
year=18;value=s18/100;output;
year=17;value=s17/100;output;
year=16;value=s16/100;output;
year=15;value=s15/100;output;
keep player year value;
run;

proc glimmix data=long;
class player year;
nloptions maxiter=1000;
model value=player year/dist=bin ddf=32;
random year/residual type=ar(1) group=player;
lsmeans player/diff ilink;
run;

I set the denominator degrees of freedom to 32, which is what the 'skeleton' ANOVA table (no RANDOM statement) provides in PROC GLM. I used the binomial distribution because the measures are the result of an undetermined number of Bernoulli trials (either made or missed).

 

SteveDenham

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 390 views
  • 3 likes
  • 4 in conversation