BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nbku222
Calcite | Level 5

Hello, I am learning SAS and I have this example data:

 

raceID FatalitySurface
xxx0011YDirt
xxx0012YSynth
xxx0013NDirt
xxx0014NDirt
xxx0015NSynth
xxx0016NDirt
xxx0017YSynth
xxx0018YSynth
xxx0029YSynth
xxx00210YDirt
xxx00211NTurf
xxx00212NDirt
xxx00213NDirt
xxx00214NDirt
xxx00215YTurf
xxx00216YTurf
xxx00317NSynth
xxx00318NDirt
xxx00319NDirt
xxx00320NDirt
xxx00321YDirt
xxx00322YDirt
xxx00323NDirt

 

What is the best way of calculating fatality and non fatality rates by surface type in SAS? 

 

Thank you

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
KachiM
Rhodochrosite | Level 12

If you like a little Data Step programming then here is the way. Count the Ys and Ns by Surface. Then find Ys/(Ys +Ns) *100 as the Fatality ratio. Then Non Fatality Ratio is 100 - Fatality Ratio. 

 

data have;
input race $6. ID @13 Fatality $1. Surface $;
datalines;
xxx001   1  Y  Dirt
xxx001   2  Y  Synth
xxx001   3  N  Dirt
xxx001   4  N  Dirt
xxx001   5  N  Synth
xxx001   6  N  Dirt
xxx001   7  Y  Synth
xxx001   8  Y  Synth
xxx002   9  Y  Synth
xxx002   10 Y  Dirt
xxx002   11 N  Turf
xxx002   12 N  Dirt
xxx002   13 N  Dirt
xxx002   14 N  Dirt
xxx002   15 Y  Turf
xxx002   16 Y  Turf
xxx003   17 N  Synth
xxx003   18 N  Dirt
xxx003   19 N  Dirt
xxx003   20 N  Dirt
xxx003   21 Y  Dirt
xxx003   22 Y  Dirt
xxx003   23 N  Dirt
;
run;

data want;
   set have end = eof;
   surf = 'Dirt Synth Turf';
   array Y[3] _temporary_;
   array N[3] _temporary_;
   i = findw(surf, trim(Surface), ' ', 'E');
   if Fatality = 'Y' then Y[i] + 1;
   else N[i] + 1;
   
   if eof then do i = 1 to 3;
      Surface = scan(surf, i);

      FatalityRatio = Y[i] * 100 / (Y[i] + N[i]);
      output;
   end;
   keep Surface FatalityRatio;
run;

The output is:

 

Obs 	Surface 	FatalityRatio
1 	Dirt 	28.5714
2 	Synth 	66.6667
3 	Turf 	66.6667

View solution in original post

7 REPLIES 7
Reeza
Super User
I would recode Y/N to 1/0 and then do a proc means on the data set. The MEAN of a binary variable is the percentage or rate you want.

KachiM
Rhodochrosite | Level 12

If you like a little Data Step programming then here is the way. Count the Ys and Ns by Surface. Then find Ys/(Ys +Ns) *100 as the Fatality ratio. Then Non Fatality Ratio is 100 - Fatality Ratio. 

 

data have;
input race $6. ID @13 Fatality $1. Surface $;
datalines;
xxx001   1  Y  Dirt
xxx001   2  Y  Synth
xxx001   3  N  Dirt
xxx001   4  N  Dirt
xxx001   5  N  Synth
xxx001   6  N  Dirt
xxx001   7  Y  Synth
xxx001   8  Y  Synth
xxx002   9  Y  Synth
xxx002   10 Y  Dirt
xxx002   11 N  Turf
xxx002   12 N  Dirt
xxx002   13 N  Dirt
xxx002   14 N  Dirt
xxx002   15 Y  Turf
xxx002   16 Y  Turf
xxx003   17 N  Synth
xxx003   18 N  Dirt
xxx003   19 N  Dirt
xxx003   20 N  Dirt
xxx003   21 Y  Dirt
xxx003   22 Y  Dirt
xxx003   23 N  Dirt
;
run;

data want;
   set have end = eof;
   surf = 'Dirt Synth Turf';
   array Y[3] _temporary_;
   array N[3] _temporary_;
   i = findw(surf, trim(Surface), ' ', 'E');
   if Fatality = 'Y' then Y[i] + 1;
   else N[i] + 1;
   
   if eof then do i = 1 to 3;
      Surface = scan(surf, i);

      FatalityRatio = Y[i] * 100 / (Y[i] + N[i]);
      output;
   end;
   keep Surface FatalityRatio;
run;

The output is:

 

Obs 	Surface 	FatalityRatio
1 	Dirt 	28.5714
2 	Synth 	66.6667
3 	Turf 	66.6667
nbku222
Calcite | Level 5

Thank you that worked perfectly. How do I do it by year if also have time variable?

novinosrin
Tourmaline | Level 20
data have;
input race $6. ID @13 Fatality $1. Surface $;
datalines;
xxx001   1  Y  Dirt
xxx001   2  Y  Synth
xxx001   3  N  Dirt
xxx001   4  N  Dirt
xxx001   5  N  Synth
xxx001   6  N  Dirt
xxx001   7  Y  Synth
xxx001   8  Y  Synth
xxx002   9  Y  Synth
xxx002   10 Y  Dirt
xxx002   11 N  Turf
xxx002   12 N  Dirt
xxx002   13 N  Dirt
xxx002   14 N  Dirt
xxx002   15 Y  Turf
xxx002   16 Y  Turf
xxx003   17 N  Synth
xxx003   18 N  Dirt
xxx003   19 N  Dirt
xxx003   20 N  Dirt
xxx003   21 Y  Dirt
xxx003   22 Y  Dirt
xxx003   23 N  Dirt
;
run;

proc sql;
create table want as
select surface, sum(fatality='Y')/n(surface) as Fatality_ratio /*if you want->format=percent5.2*/
from have
group by surface;
quit;

@Reeza 's words have demonstrated 

novinosrin
Tourmaline | Level 20

Datastep isn't convenient for a column operation that SQL/MEANS/SUMMARY offers in parallel that can be coded at ease, nonetheless fun stuff

 

data have;
input race $6. ID @13 Fatality $1. Surface $;
datalines;
xxx001   1  Y  Dirt
xxx001   2  Y  Synth
xxx001   3  N  Dirt
xxx001   4  N  Dirt
xxx001   5  N  Synth
xxx001   6  N  Dirt
xxx001   7  Y  Synth
xxx001   8  Y  Synth
xxx002   9  Y  Synth
xxx002   10 Y  Dirt
xxx002   11 N  Turf
xxx002   12 N  Dirt
xxx002   13 N  Dirt
xxx002   14 N  Dirt
xxx002   15 Y  Turf
xxx002   16 Y  Turf
xxx003   17 N  Synth
xxx003   18 N  Dirt
xxx003   19 N  Dirt
xxx003   20 N  Dirt
xxx003   21 Y  Dirt
xxx003   22 Y  Dirt
xxx003   23 N  Dirt
;
run;
data want; 
if _n_=1 then do;
   dcl hash H () ;
   h.definekey  ("surface") ;
   h.definedata ("surface","_N","_count") ;
   h.definedone () ;
   dcl hiter hi('h');
end;
do until(z);
 set have end=z;
 if h.find() ne 0 then do;
 _N=1;_count=0;
 end;
 else _n=sum(_n,1);
 _count=sum(_count,Fatality='Y');
 h.replace();
end;
do while(hi.next()=0);
 Fatality_ratio=_count/_n;
 output;
end;
keep Surface Fatality_ratio;
run;
novinosrin
Tourmaline | Level 20

SORT and DOW

 

data have;
input race $6. ID @13 Fatality $1. Surface $;
datalines;
xxx001   1  Y  Dirt
xxx001   2  Y  Synth
xxx001   3  N  Dirt
xxx001   4  N  Dirt
xxx001   5  N  Synth
xxx001   6  N  Dirt
xxx001   7  Y  Synth
xxx001   8  Y  Synth
xxx002   9  Y  Synth
xxx002   10 Y  Dirt
xxx002   11 N  Turf
xxx002   12 N  Dirt
xxx002   13 N  Dirt
xxx002   14 N  Dirt
xxx002   15 Y  Turf
xxx002   16 Y  Turf
xxx003   17 N  Synth
xxx003   18 N  Dirt
xxx003   19 N  Dirt
xxx003   20 N  Dirt
xxx003   21 Y  Dirt
xxx003   22 Y  Dirt
xxx003   23 N  Dirt
;
run;

proc sort data=have;
by Surface ;
run;

data want;
 do _n_=1 by 1 until(last.surface);
 set have;
 by surface;
 _c=sum(_c,fatality='Y');
 end;
 Fatality_ratio=_c/_n_;
 keep surface Fatality_ratio;
run;
KachiM
Rhodochrosite | Level 12

@nbku222 

 

I was sleeping when you inquired on 'year and time'. 

I don't understand the input data set for this situation. Please place your question with a new subject line so that a new thread is created which will be current.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 741 views
  • 2 likes
  • 4 in conversation