Hello.
I have vital statistics data, and I need to calculate the total number of deaths for each main group and subgroup of causes of death. The causes of death are coded using International Classification of Disease, versions 7 through 10, and each group or subgroup includes codes pooled from these four ICD versions. I’ve written code that labels each subgroup and also assigns labels to the main groups, such as deathcause 1 and deathcause 2. The code calculates the total counts for the subgroups (deathcause 2), and then for the remaining main groups (deathcause 1). However, what I want to achieve now is to compute the total number of deaths for each main group—for example, the total count for deathcause 1, and the total count for deathcause 2.
Thank you!
data orig_data;
input ID $ sex death_cause $ ICD;
datalines;
A 1 A00 10
B 2 A00.1 10
C 2 A00.2 10
D 2 B00 10
C 1 B00.3 10
.
.
.
A 1 A00 9
B 2 A00.1 9
C 1 A00.2 9
D 1 B00 9
C 1 B00.3 9
.
.
.
run;
data new; set orig_data;
IF ICD = 10 then do;
IF "A00" <= substr(death_cause,1,3) <= "A99" then do;
IF substr(death_cause,1,3) in ("A20","A21","A25") then do; deathcause_2=2; end;
Else deathcause_1=1;
end;
......................
IF ICD = 9 then do;
IF "001" <= substr(death_cause,1,3) <= "139" then do;
IF substr(death_cause,1,3) in ("032","033","034") then do; deathcause_2=2; end;
Else deathcause_1=1;
end;
.....................
The code you provided is syntactically incorrect (and I'm not talking about dots that can be easily removed).
Could you also provided expected result for the data you presented?
Bart
Sorry. My code is actually like below. It works well. But, I need a code that calculate total deathcause_1 and total deathcause_2. Thank you.
data orig_data;
input ID $ sex death_cause $ ICD;
datalines;
A 1 A00 10
B 2 A00.1 10
C 2 A00.2 10
D 2 B00 10
C 1 B00.3 10
.
.
.
A 1 A00 9
B 2 A00.1 9
C 1 A00.2 9
D 1 B00 9
C 1 B00.3 9
.
.
.
run;
data new; set orig_data;
IF ICD = 10 then do;
IF "A00" <= substr(death_cause,1,3) <= "A99" then do;
IF substr(death_cause,1,3) in ("A20","A21","A25") then do; deathcause_2=2; end;
Else deathcause_1=1; End;
end;
......................IF ICD = 9 then do;
IF "001" <= substr(death_cause,1,3) <= "139" then do;
IF substr(death_cause,1,3) in ("032","033","034") then do; deathcause_2=2; end;
Else deathcause_1=1; End;
end;.....................
You can add a sum statement to accumulate the total. Is this what you are trying to do:
data new; set orig_data;
IF ICD = 10 then do;
IF "A00" <= substr(death_cause,1,3) <= "A99" then do;
IF substr(death_cause,1,3) in ("A20","A21","A25") then do; deathcause_2=2; end;
Else deathcause_1=1; End;
end;
IF ICD = 9 then do;
IF "001" <= substr(death_cause,1,3) <= "139" then do;
IF substr(death_cause,1,3) in ("032","033","034") then do; deathcause_2=2; end;
Else deathcause_1=1; End;
end;
total_deathcause_2+deathcause_2;
total_deathcause_1+deathcause_1;
run;
Not sure I follow your example.
What is the meaning of these DEATHCLAUSE_1 and DEATCHCLAUSE_2 variables? What is the meaning of the 1 and 2 values that you are putting into the variables?
But it sounds like you have a variable that is coded with multiple values like 1 and 2 and you want to count how many ones and twos there are. PROC FREQ can easily do that.
If you want to summarize data on multiple different dimensions at once you can use PROC SUMMARY. Since it appears that all of your observations are deaths of one cause or another so just use the _FREQ_ output variable as your count. You can use the WAYS or TYPES statement to indicate the combinations of your CLASS variables you want output. If instead your data included some observations that were not deaths then make a variable with values of 1 (true/yes/on) and 0 (false/no/off) and then calculate the SUM statistic to get the count.
But if you want to accumulate the counts in a data step as you process the observations you could use a SUM statement and take advantage of the fact that SAS evaluates boolean expressions as either 1 or 0. So you could make separate variables that accumulates the number of observations with that particular value. Example:
data want;
set have;
num_cause1 + (cause=1);
num_cause2 + (cause=2);
run;
This sort of nested definition is one of the places that MULTILABEL formats can be useful. The limitation of these is that one a few procedures will use them but since Proc Summary/Means, Report and Tabulate will then counting is quite possible.
Strangely enough the example below also uses a Vital Statistics example though the data is artificial and you generate it.
This example was designed more to show ways that options affect appearance of reports in Proc Tabulate but the approach should be similar. Part of this exercise is to show that the way the Multilabel format is defined affects appearance as in order of rows in this case for same values.
/* To demonstrate how the order of definition affects appearance in multilabel formats. Also appearance options to show the spaces to get the indent as desired and fix column widths. And investigate whether class level format based style overrides work with MLF in proc tabulate. Result: NO. */ proc format library=work; value accidentl (multilabel notsorted) 1-5 = 'Accidents' 1-3 = ' Transport accidents' 1 = ' Motor vehicle accidents' 2 = ' Water, air, and space' 3 = ' Other land transport accidents' 4-5 = ' Nontransport accidents' 5 = ' Fishing' ; value accidentr (multilabel notsorted) 1-5 = 'Accidents' 1-3 = ' Transport accidents' 4-5 = ' Nontransport accidents' 1 = ' Motor vehicle accidents' 2 = ' Water, air, and space' 3 = ' Other land transport accidents' 5 = ' Fishing' ; value accidents (multilabel notsorted) 1 = ' Motor vehicle accidents' 2 = ' Water, air, and space' 3 = ' Other land transport accidents' 5 = ' Fishing' 1-5 = 'Accidents' 1-3 = ' Transport accidents' 4-5 = ' Nontransport accidents' ; value mf 1 = "Male" 2 = "Female" ; value mycolor 1='white' 2='red' 3='green' 4='blue' 5='orange' 6='purple' ; value accsimple 1 = 'Motor vehicle accidents' 2 = 'Water, air, and space' 3 = 'Other land transport accidents' 4 = 'Nontransport accidents' 5 = 'Fishing' ; value MyColorl (multilabel notsorted) 1-5 = 'white' 1-3 = 'red' 1 = 'blue' 2 = 'orange' 3 = 'pink' 4-5 = 'purple' 5 = 'black' ; run; /* populate a dataset to display */ /* This specifically does NOT generate any data for FISHING above*/ /* to display the behavior of the options below in those cases. */ data junk; do i=1 to 50; type = round(4*ranuni(1234)+.5); sex = round(2*ranuni(3455)+.5); output; end; run; /* Notice that before we get here the data is NOT sorted */ /* in any manner!!!! */ title 'Option Order=Data Preloadfmt Printmiss Misstext=0 format accidentl'; proc tabulate data=junk order=data ; class type / mlf PRELOADFMT; /*ASIS preserves the leading spaces in the format, Cellwidth here is optional for this demo. These will ONLY affect ODS output such as HTML, RTF or PDF not the OUTPUT window */ classlev type/ style=[asis=on cellwidth=1.5in]; class sex; /* the formatted values of SEX normally take up different lengths, this fixes the column widths to be the same for both headers. May cause interesting appearances with large numbers of displayed digits if requested in formats*/ classlev sex/ style=[cellwidth=.5in]; /* to get the ALL to have the same width as SEX need to specify this way*/ table type=' ', all='Total'*{style=[cellwidth=.5in]}*n=' '*f=comma9. sex=' '*n=' '*f=comma8. / printmiss misstext='0'; format type accidentl. sex mf.; run;title; title 'Option simple format and classlev background color'; proc tabulate data=junk order=data ; class type / ; /*ASIS preserves the leading spaces in the format, Cellwidth here is optional for this demo. These will ONLY affect ODS output such as HTML, RTF or PDF not the OUTPUT window */ classlev type/ style=[asis=on cellwidth=1.5in background=mycolor.]; class sex; /* the formatted values of SEX normally take up different lengths, this fixes the column widths to be the same for both headers. May cause interesting appearances with large numbers of displayed digits if requested in formats*/ classlev sex/ style=[cellwidth=.5in]; /* to get the ALL to have the same width as SEX need to specify this way*/ table type=' ', all='Total'*{style=[cellwidth=.5in]}*n=' '*f=comma9. sex=' '*n=' '*f=comma8. / row=float misstext='0'; format type accsimple. sex mf.; run;title; title 'Option Order=Data Preloadfmt Printmiss Misstext=0 format accidentl classlev background'; proc tabulate data=junk order=data; class type / mlf PRELOADFMT; classlev type/ style=[asis=on cellwidth=1.5in background=mycolor.]; class sex; classlev sex/ style=[cellwidth=.5in]; table type=' ', all='Total'*{style=[cellwidth=.5in]}*n=' '*f=comma9. sex=' '*n=' '*f=comma8. / printmiss misstext='0'; format type accidentl. sex mf.; run;title; title 'Option Order=Data Preloadfmt Printmiss Misstext=0 format accidents'; proc tabulate data=junk order=data; class type / mlf PRELOADFMT; classlev type/ style=[asis=on cellwidth=1.5in]; class sex; classlev sex/ style=[cellwidth=.5in]; table type=' ', all='Total'*{style=[cellwidth=.5in]}*n=' '*f=comma9. sex=' '*n=' '*f=comma8. / printmiss misstext='0'; format type accidents. sex mf.; run;title;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.