Re: I need help with analyzing this dataset and knowing which statisti...

Godzilla_Hat · Posted 12-01-2021 12:06 AM

Hi everyone, I;m looking at the NFL's field goal kicks and comparing the players.

The dataset I have is the top 10 best players with their Field goal percentages that they make. I have their percentages from seaons 2020-2015 as well as the average.

What statistical procedures/functions could I use when looking at this dataset ?

I'm a bit lost on what I should do so any help would be much appreciated !

The blank sections of the data are due to players siiting out for a season or two.

data topNFL; /* top players and their averages*/
infile datalines dlm=','; 
input player $ S20 S19 S18 S17 S16 S15 Average;
datalines;
Gano ,96.88,.,87.50,96.67,78.95,83.33,83.94,
Boswell ,95.00,93.55,65.00,92.11,84.00,90.63,88.89,
Koo ,94.87,88.46,50.00,.,.,.,88.64,
Carlson ,94.29,73.08,85.32,.,.,.,85.32,
Santos ,93.75,44.44,77.78,80.00,88.57,81.08,83.24,
Folk,92.86,82.35,.,54.55,87.10,81.25,82.29,
Butker ,92.59,89.47,88.89,90.48,.,.,89.47,
Sanders ,92.31,76.67,90.00,.,.,.,84.4,
Tucker ,90.70,96.55,89.74,91.89,97.44,82.50,90.75,
Succop ,90.32,16.67,86.67,83.33,91.67,87.50,82.99,
;
Proc sort data=topNFL;
by descending average;
run;

PROC PRINT DATA=TOPNFL; 
RUN;

PaigeMiller · Posted 12-01-2021 06:07 AM

... and comparing the players ...

What statistical procedures/functions could I use when looking at this dataset ?

Please state clearly what question you want to answer using this data set. What is the criterion for doing this comparison?

--
Paige Miller

Ksharp · Posted 12-01-2021 07:14 AM

It looks like a non-supervise learning model.
Try Principal Component Analysis : @Rick_SAS 's blog
https://blogs.sas.com/content/iml/2014/11/07/distribution-of-blood-types.html

or Try Cluster analysis:
proc cluster
proc fastclus

SteveDenham · Posted 12-01-2021 11:03 AM

Everyone sees something in data based on what they do the most. If I were faced with this dataset, I would consider some sort of repeated measures using a generalized linear model. For instance, if you were looking for estimates for each player, as a conditional value over time, you might try:

data long;
set topnfl;
year=20;value=s20/100;output;
year=19;value=s19/100;output;
year=18;value=s18/100;output;
year=17;value=s17/100;output;
year=16;value=s16/100;output;
year=15;value=s15/100;output;
keep player year value;
run;

proc glimmix data=long;
class player year;
nloptions maxiter=1000;
model value=player year/dist=bin ddf=32;
random year/residual type=ar(1) group=player;
lsmeans player/diff ilink;
run;

I set the denominator degrees of freedom to 32, which is what the 'skeleton' ANOVA table (no RANDOM statement) provides in PROC GLM. I used the binomial distribution because the measures are the result of an undetermined number of Bernoulli trials (either made or missed).

SteveDenham

I need help with analyzing this dataset and knowing which statistical procedures I should use.

Re: I need help with analyzing this dataset and knowing which statistical procedures I should use.

Re: I need help with analyzing this dataset and knowing which statistical procedures I should use.

Re: I need help with analyzing this dataset and knowing which statistical procedures I should use.

Catch up on SAS Innovate 2026