Thanks for the explanation, that helps me understand how missing values work. I tried running the code without any filtering for the missing values, but I still get an error. Specifically: ERROR: The VBOX variable must be numeric. I assumed this error was from the missing values. Perhaps there is some other error that is causing it? EDIT: I found the solution after the responder edited his post. This line is the correct solution "Second, SAS will not let you change the type of a variable from character to numeric or vice versa. To get a numeric value, assuming duration is character, you should create a new variable as you can't use the old one." As someone coming from R you can change one back and forth, so this was not intuitive for me in SAS. Here is the corrected code that works if anyone stumbles on this in the future: * Get data 1;
filename test1234 temp;
proc http
url="https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-09-07/pit_stops.csv"
method="GET"
out=test1234;
run;
proc import out=pit_stops datafile=test1234 dbms=csv replace;
guessingrows = max;
run;
* Get data 2;
filename racecsv temp;
proc http
url="https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-09-07/races.csv"
method="GET"
out=racecsv;
run;
proc import out=races datafile=racecsv dbms=csv replace;
guessingrows = max;
run;
* Merge/join the data;
* Sorting is required in SAS to merge correctly;
proc sort data=pit_stops;
by raceId;
run;
proc sort data=races;
by raceId;
run;
data pit_stopdf;
merge pit_stops races;
by raceId;
run;
* Drop all(?) blank rows;
data pit_stopdf;
set pit_stopdf;
where year > 2010;
run;
* Change duration to numeric;
data pit_stopdf;
set pit_stopdf;
duration_numeric = input(duration, comma8.);
run;
* Plot the data;
proc sgplot data=pit_stopdf;
vbox duration_numeric / category=year;
title 'Formula1 pit stop duration';
xaxis label = "Year";
yaxis label = "Duration";
run;
title;
... View more