Data visualization with SAS programming

Fun w/ODS Graphics: A Vector Plot Look at Katie Ledecky's Dominance of the Women's 800m Freestyle

Reply
Frequent Contributor
Frequent Contributor
Posts: 76

Fun w/ODS Graphics: A Vector Plot Look at Katie Ledecky's Dominance of the Women's 800m Freestyle

ledecky.png

 

If you didn't see it, Katie Ledecky completely dominated the field in the Women's 800 Freestyle on Friday. But how to show that in a chart is tricky, because the margin of victory - 11.38 seconds - gets lost when compared to total race time. To illustrate Ledecky's decisive victory, The NY Times used a cute-but-hardly-data-rich animated recreation of the 16-pool-length race. Back in the 2012 Olympics, the NYT came up with a lap chart that showed the ranking of swimmers at the end of each 50m split. But it didn't show how far behind swimmers were behind the leader, so I tried to address that with the chart above - kind of a vector plot take on a parallel coordinates plot - which illustrates how Ledecky maintained and extended her lead throughout the race. Scroll to the bottom of the below code/data to see the 12 statements that actually generate the chart!

 

* Chart Katie Ledecky's lead at each split of the Women's 800m Freestyle;

* Step 1. Munge messy cut-and-pasted PDF data into something chartable;

data ledecky(keep=rank lane name country split time);
length rank lane split 3. name $ 30. country $ 3.;
retain rank lane name country;
infile cards truncover;
input buffer $80.;
if substr(buffer,2,1)=' ' then do;             * Swimmer info?
  rank=scan(buffer,1,' ');
  lane=scan(buffer,2,' ');
  name=scan(buffer,3,' ');
  i=4;                                         
  do while(length(scan(buffer,i,' '))^=3);     * Look for 3-character country code;
    name=trim(name)||" "||scan(buffer,i, ' ');
    i+1;
  end;
  country=scan(buffer,i, ' ');
  name=trim(name)||' ('||country||')';
  split=0;
  time=input('00:00',hhmmss.);
  output;
  end;
else if index(buffer,'m') then do;             * 50m split time info ?;
  split=compress(scan(buffer,1,' '),'m');
  time=input(scan(buffer,-1,' '),hhmmss.);
  output;
  if split=750 then do;                        * Special logic for final split;
    input splittime : hhmmss.;
    split=800;
    time+splittime;
    output;
    end;
  end;
* Source: Cut-and-pasted PDF data from rio2016.com/en/swimming-womens-800m-freestyle-final;
cards;
1 4 LEDECKY Katie USA 0.70 8:04.79 WR
50m (1) 28.03
29.95
100m (1) 57.98
30.73
150m (1) 1:28.71
30.71
200m (1) 1:59.42
30.64
250m (1) 2:30.06
30.70
300m (1) 3:00.76
30.37
350m (1) 3:31.13
30.85
400m (1) 4:01.98
30.22
450m (1) 4:32.20
30.74
500m (1) 5:02.94
30.60
550m (1) 5:33.54
30.76
600m (1) 6:04.30
30.77
650m (1) 6:35.07
30.37
700m (1) 7:05.44
30.36
750m (1) 7:35.80
28.99
2 3 CARLIN Jazz GBR 0.74 8:16.17 11.38
50m (2) 28.88
31.01
100m (2) 59.89
31.01
150m (2) 1:30.90
31.11
200m (2) 2:02.01
31.26
250m (2) 2:33.27
31.35
300m (2) 3:04.62
31.39
350m (2) 3:36.01
31.46
400m (2) 4:07.47
31.37
450m (2) 4:38.84
31.41
500m (3) 5:10.25
31.47
550m (3) 5:41.72
31.41
600m (3) 6:13.13
31.28
650m (3) 6:44.41
31.66
700m (3) 7:16.07
30.70
750m (2) 7:46.77
29.40
3 5 KAPAS Boglarka HUN 0.68 8:16.37 11.58
50m (8) 29.62
31.18
100m (8) 1:00.80
31.31
150m (8) 1:32.11
30.98
200m (4) 2:03.09
31.23
250m (4) 2:34.32
30.82
300m (3) 3:05.14
31.38
350m (3) 3:36.52
31.05
400m (3) 4:07.57
31.31
450m (3) 4:38.88
31.17
500m (2) 5:10.05
31.39
550m (2) 5:41.44
31.42
600m (2) 6:12.86
31.52
650m (2) 6:44.38
31.39
700m (2) 7:15.77
31.51
750m (3) 7:47.28
29.09
4 8 BELMONTE GARCIA Mireia ESP 0.67 8:18.55 13.76
50m (5) 29.16
30.86
100m (3) 1:00.02
31.48
150m (3) 1:31.50
31.12
200m (3) 2:02.62
31.38
250m (3) 2:34.00
31.15
300m (4) 3:05.15
31.60
350m (4) 3:36.75
31.39
400m (4) 4:08.14
31.53
450m (4) 4:39.67
31.27
500m (4) 5:10.94
31.87
550m (4) 5:42.81
31.37
600m (4) 6:14.18
31.78
650m (4) 6:45.96
31.18
700m (4) 7:17.14
31.39
750m (4) 7:48.53
30.02
5 7 ASHWOOD Jessica AUS 0.78 8:20.32 15.53
50m (7) 29.29
31.00
100m (5) 1:00.29
31.75
150m (7) 1:32.04
31.63
200m (8) 2:03.67
31.79
250m (8) 2:35.46
31.62
300m (8) 3:07.08
31.84
350m (8) 3:38.92
31.86
400m (7) 4:10.78
31.45
450m (6) 4:42.23
31.58
500m (6) 5:13.81
31.55
550m (6) 5:45.36
31.47
600m (5) 6:16.83
31.35
650m (5) 6:48.18
31.44
700m (5) 7:19.62
31.26
750m (5) 7:50.88
29.44
6 6 SMITH Leah USA 0.76 8:20.95 16.16
50m (6) 29.18
31.04
100m (4) 1:00.22
31.62
150m (5) 1:31.84
31.74
200m (6) 2:03.58
31.73
250m (7) 2:35.31
31.57
300m (6) 3:06.88
31.93
350m (7) 3:38.81
31.80
400m (6) 4:10.61
31.74
450m (7) 4:42.35
31.75
500m (7) 5:14.10
31.74
550m (7) 5:45.84
31.50
600m (7) 6:17.34
31.57
650m (7) 6:48.91
31.12
700m (6) 7:20.03
31.09
750m (6) 7:51.12
29.83
7 2 FRIIS Lotte DEN 0.78 8:24.50 19.71
50m (=3) 29.05
31.31
100m (6) 1:00.36
31.15
150m (4) 1:31.51
31.72
200m (5) 2:03.23
31.42
250m (5) 2:34.65
31.91
300m (5) 3:06.56
31.61
350m (5) 3:38.17
31.89
400m (5) 4:10.06
31.55
450m (5) 4:41.61
31.95
500m (5) 5:13.56
31.64
550m (5) 5:45.20
32.03
600m (6) 6:17.23
31.55
650m (6) 6:48.78
32.29
700m (7) 7:21.07
31.58
750m (7) 7:52.65
31.85
8 1 KOHLER Sarah GER 0.73 8:27.75 22.96
50m (=3) 29.05
31.44
100m (7) 1:00.49
31.41
150m (6) 1:31.90
31.70
200m (7) 2:03.60
31.51
250m (6) 2:35.11
31.85
300m (7) 3:06.96
31.80
350m (6) 3:38.76
32.15
400m (8) 4:10.91
31.80
450m (8) 4:42.71
32.33
500m (8) 5:15.04
32.00
550m (8) 5:47.04
32.52
600m (8) 6:19.56
32.20
650m (8) 6:51.76
32.45
700m (8) 7:24.21
32.02
750m (8) 7:56.23
31.52
;
* 2. Compute difference from leader at each split as well as vector coordinates;

proc sql;                                      
create table ledecky2plot as 
select l1.name, l1.time as splittime, l1.diff as time, l1.split as x, l1.diff as y, 
       l0.split as xorigin, l0.diff as yorigin, case when l1.split=800 then l1.name end as namefinal
from (select l.*, l.time-m.mintime as diff from ledecky l 
      join (select split, min(time) as mintime from ledecky group by 1) m on l.split=m.split) l1
join (select l.*, l.time-m.mintime as diff from ledecky l 
      join (select split, min(time) as mintime from ledecky group by 1) m on l.split=m.split) l0
on l1.name=l0.name and l1.split-50=l0.split;   * Match split to prior split for vector coordinates;

* 3. Vector chart of differences in time from leader at each split, scatter plot of swimmer names
     at final split, and axis tables of both time behind leader and cumulative time;

proc format;
picture meter low-999='000'm;                  * Format meters for x-axis;       

ods listing image_dpi=300 gpath='/folders/myfolders';
ods graphics on / reset antialias width=11in height=8.5in imagename="ledecky";
proc sgplot data=ledecky2plot noautolegend;
title height=10pt "Leader of the Pack: How Katie Ledecky's Lead Built Up in Women's 800m Freestyle";
vector x=x y=y / group=name xorigin=xorigin yorigin=yorigin y2axis lineattrs=(pattern=solid);
y2axis display=(nolabel) valuesformat=mmss8.0 valueattrs=(size=6pt) grid;
yaxis valuesformat=mmss8.0 valueattrs=(size=6pt) grid label="TIME BEHIND LEADER (MM:SS)" labelattrs=(size=6pt);
xaxis display=(nolabel) grid values=(0 to 800 by 50) valueattrs=(size=6pt) valuesformat=meter.;
xaxistable time / class=name valueattrs=(size=6pt) title="TIME BEHIND LEADER (MM:SS)" labelattrs=(size=6pt) titleattrs=(size=6pt) ;      
xaxistable splittime / class=name valueattrs=(size=6pt) title="CUMULATIVE TIME (MM:SS)" labelattrs=(size=6pt) titleattrs=(size=6pt);        
format time splittime mmss8.2;
scatter y=y x=x / datalabel=namefinal datalabelattrs=(size=6pt) markerattrs=(size=0pt);
run;

 

Respected Advisor
Posts: 4,812

Re: Fun w/ODS Graphics: A Vector Plot Look at Katie Ledecky's Dominance of the Women's 800m Freestyl

Good illustration of the race. Maybe "time behind" could be converted into "distance behind" and correspond to what we actually see (measure visually) during the race?

PG
Frequent Contributor
Frequent Contributor
Posts: 76

Re: Fun w/ODS Graphics: A Vector Plot Look at Katie Ledecky's Dominance of the Women's 800m Freestyl

[ Edited ]

I had considered that - and in fact there's another software company that's currently featuring a very slick-looking DataViz that purports to show how far behind in meters sprinters from 1896 would have finished behind Usain Bolt in 2012 in the 100m dash.

 

However, when the only real data one has on competitors are split and completion times, I believe the only real comparisons one can legitimately make are time-based, since velocity is not a constant for swimmers or runners - seeing the amazing Usain Bolt run tonight certainly illustrated that!

 

It'd be interesting to know if the NY Times animation was based upon computed average velocities based on split times, or if they actually had more granular distance measurements at points in time for all swimmers (e.g., from the video). The former would be misleading (a disclosure would be nice if that was the case), but would probably be good enough for infographic work. Smiley Happy

Frequent Contributor
Frequent Contributor
Posts: 76

Re: Fun w/ODS Graphics: A Vector Plot Look at Katie Ledecky's Dominance of the Women's 800m Freestyl

[ Edited ]

Oops, maybe if I'd heeded PGStats' suggestion, I could have had my work in the New York Times. Smiley Very Happy

 

Yesterday the NYT ran Usain Bolt and the Fastest Men in the World Since 1896 - on the Same Track, which bears a strong resemblance to the Tableau chart that I critiqued above, which bears a strong resemblance to a 2012 NYT chart of 100m sprinters, which bears a strong resemblance to a 2012 NYT chart of 100m swimmers...

 

However, the three NYT'ers credited with the story do hint that the positions of the fastest men in the world on their charts don't really represent where the runners actually were on the track after 9.63 seconds, Usain Bolt's winning 100m time in 2012.

 

"We then pitted these runners against each other in an imaginary race, using their average speeds," is how the NY Times explains its methodology.

 

So, when the NY Times says, "we have Tom Burke, who won in Athens in 1896. His time, 12 seconds, puts him more than 60 feet behind the Bolts", keep in mind that this was figured using a simple calculation - (100m - 100m/12s * 9.63s) * 3.28084ft/m = 65ft - that's based on the assumption that sprinters have the same average velocity throughout a race, which Wired painstakingly illustrated is clearly not the case.

 

Oh well, good enough for NYT work. Smiley Wink

Ask a Question
Discussion stats
  • 3 replies
  • 1637 views
  • 4 likes
  • 2 in conversation