Hello, I am learning SAS and I have this example data:
race | ID | Fatality | Surface |
xxx001 | 1 | Y | Dirt |
xxx001 | 2 | Y | Synth |
xxx001 | 3 | N | Dirt |
xxx001 | 4 | N | Dirt |
xxx001 | 5 | N | Synth |
xxx001 | 6 | N | Dirt |
xxx001 | 7 | Y | Synth |
xxx001 | 8 | Y | Synth |
xxx002 | 9 | Y | Synth |
xxx002 | 10 | Y | Dirt |
xxx002 | 11 | N | Turf |
xxx002 | 12 | N | Dirt |
xxx002 | 13 | N | Dirt |
xxx002 | 14 | N | Dirt |
xxx002 | 15 | Y | Turf |
xxx002 | 16 | Y | Turf |
xxx003 | 17 | N | Synth |
xxx003 | 18 | N | Dirt |
xxx003 | 19 | N | Dirt |
xxx003 | 20 | N | Dirt |
xxx003 | 21 | Y | Dirt |
xxx003 | 22 | Y | Dirt |
xxx003 | 23 | N | Dirt |
What is the best way of calculating fatality and non fatality rates by surface type in SAS?
Thank you
If you like a little Data Step programming then here is the way. Count the Ys and Ns by Surface. Then find Ys/(Ys +Ns) *100 as the Fatality ratio. Then Non Fatality Ratio is 100 - Fatality Ratio.
data have;
input race $6. ID @13 Fatality $1. Surface $;
datalines;
xxx001 1 Y Dirt
xxx001 2 Y Synth
xxx001 3 N Dirt
xxx001 4 N Dirt
xxx001 5 N Synth
xxx001 6 N Dirt
xxx001 7 Y Synth
xxx001 8 Y Synth
xxx002 9 Y Synth
xxx002 10 Y Dirt
xxx002 11 N Turf
xxx002 12 N Dirt
xxx002 13 N Dirt
xxx002 14 N Dirt
xxx002 15 Y Turf
xxx002 16 Y Turf
xxx003 17 N Synth
xxx003 18 N Dirt
xxx003 19 N Dirt
xxx003 20 N Dirt
xxx003 21 Y Dirt
xxx003 22 Y Dirt
xxx003 23 N Dirt
;
run;
data want;
set have end = eof;
surf = 'Dirt Synth Turf';
array Y[3] _temporary_;
array N[3] _temporary_;
i = findw(surf, trim(Surface), ' ', 'E');
if Fatality = 'Y' then Y[i] + 1;
else N[i] + 1;
if eof then do i = 1 to 3;
Surface = scan(surf, i);
FatalityRatio = Y[i] * 100 / (Y[i] + N[i]);
output;
end;
keep Surface FatalityRatio;
run;
The output is:
Obs Surface FatalityRatio 1 Dirt 28.5714 2 Synth 66.6667 3 Turf 66.6667
If you like a little Data Step programming then here is the way. Count the Ys and Ns by Surface. Then find Ys/(Ys +Ns) *100 as the Fatality ratio. Then Non Fatality Ratio is 100 - Fatality Ratio.
data have;
input race $6. ID @13 Fatality $1. Surface $;
datalines;
xxx001 1 Y Dirt
xxx001 2 Y Synth
xxx001 3 N Dirt
xxx001 4 N Dirt
xxx001 5 N Synth
xxx001 6 N Dirt
xxx001 7 Y Synth
xxx001 8 Y Synth
xxx002 9 Y Synth
xxx002 10 Y Dirt
xxx002 11 N Turf
xxx002 12 N Dirt
xxx002 13 N Dirt
xxx002 14 N Dirt
xxx002 15 Y Turf
xxx002 16 Y Turf
xxx003 17 N Synth
xxx003 18 N Dirt
xxx003 19 N Dirt
xxx003 20 N Dirt
xxx003 21 Y Dirt
xxx003 22 Y Dirt
xxx003 23 N Dirt
;
run;
data want;
set have end = eof;
surf = 'Dirt Synth Turf';
array Y[3] _temporary_;
array N[3] _temporary_;
i = findw(surf, trim(Surface), ' ', 'E');
if Fatality = 'Y' then Y[i] + 1;
else N[i] + 1;
if eof then do i = 1 to 3;
Surface = scan(surf, i);
FatalityRatio = Y[i] * 100 / (Y[i] + N[i]);
output;
end;
keep Surface FatalityRatio;
run;
The output is:
Obs Surface FatalityRatio 1 Dirt 28.5714 2 Synth 66.6667 3 Turf 66.6667
Thank you that worked perfectly. How do I do it by year if also have time variable?
data have;
input race $6. ID @13 Fatality $1. Surface $;
datalines;
xxx001 1 Y Dirt
xxx001 2 Y Synth
xxx001 3 N Dirt
xxx001 4 N Dirt
xxx001 5 N Synth
xxx001 6 N Dirt
xxx001 7 Y Synth
xxx001 8 Y Synth
xxx002 9 Y Synth
xxx002 10 Y Dirt
xxx002 11 N Turf
xxx002 12 N Dirt
xxx002 13 N Dirt
xxx002 14 N Dirt
xxx002 15 Y Turf
xxx002 16 Y Turf
xxx003 17 N Synth
xxx003 18 N Dirt
xxx003 19 N Dirt
xxx003 20 N Dirt
xxx003 21 Y Dirt
xxx003 22 Y Dirt
xxx003 23 N Dirt
;
run;
proc sql;
create table want as
select surface, sum(fatality='Y')/n(surface) as Fatality_ratio /*if you want->format=percent5.2*/
from have
group by surface;
quit;
@Reeza 's words have demonstrated
Datastep isn't convenient for a column operation that SQL/MEANS/SUMMARY offers in parallel that can be coded at ease, nonetheless fun stuff
data have;
input race $6. ID @13 Fatality $1. Surface $;
datalines;
xxx001 1 Y Dirt
xxx001 2 Y Synth
xxx001 3 N Dirt
xxx001 4 N Dirt
xxx001 5 N Synth
xxx001 6 N Dirt
xxx001 7 Y Synth
xxx001 8 Y Synth
xxx002 9 Y Synth
xxx002 10 Y Dirt
xxx002 11 N Turf
xxx002 12 N Dirt
xxx002 13 N Dirt
xxx002 14 N Dirt
xxx002 15 Y Turf
xxx002 16 Y Turf
xxx003 17 N Synth
xxx003 18 N Dirt
xxx003 19 N Dirt
xxx003 20 N Dirt
xxx003 21 Y Dirt
xxx003 22 Y Dirt
xxx003 23 N Dirt
;
run;
data want;
if _n_=1 then do;
dcl hash H () ;
h.definekey ("surface") ;
h.definedata ("surface","_N","_count") ;
h.definedone () ;
dcl hiter hi('h');
end;
do until(z);
set have end=z;
if h.find() ne 0 then do;
_N=1;_count=0;
end;
else _n=sum(_n,1);
_count=sum(_count,Fatality='Y');
h.replace();
end;
do while(hi.next()=0);
Fatality_ratio=_count/_n;
output;
end;
keep Surface Fatality_ratio;
run;
SORT and DOW
data have;
input race $6. ID @13 Fatality $1. Surface $;
datalines;
xxx001 1 Y Dirt
xxx001 2 Y Synth
xxx001 3 N Dirt
xxx001 4 N Dirt
xxx001 5 N Synth
xxx001 6 N Dirt
xxx001 7 Y Synth
xxx001 8 Y Synth
xxx002 9 Y Synth
xxx002 10 Y Dirt
xxx002 11 N Turf
xxx002 12 N Dirt
xxx002 13 N Dirt
xxx002 14 N Dirt
xxx002 15 Y Turf
xxx002 16 Y Turf
xxx003 17 N Synth
xxx003 18 N Dirt
xxx003 19 N Dirt
xxx003 20 N Dirt
xxx003 21 Y Dirt
xxx003 22 Y Dirt
xxx003 23 N Dirt
;
run;
proc sort data=have;
by Surface ;
run;
data want;
do _n_=1 by 1 until(last.surface);
set have;
by surface;
_c=sum(_c,fatality='Y');
end;
Fatality_ratio=_c/_n_;
keep surface Fatality_ratio;
run;
I was sleeping when you inquired on 'year and time'.
I don't understand the input data set for this situation. Please place your question with a new subject line so that a new thread is created which will be current.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.