Hi,
I have a SAS program that is reading in death(mortality) data by age and I created the following code for five age groups:
IF AGE GE 0 AND AGE lt 17 THEN AGEGP1="0-17";
ELSE IF 18<=AGE<=29 THEN AGEGP1="18-29";
ELSE IF 30<=AGE<=44 THEN AGEGP1="30-44";
ELSE IF 45<=AGE<=64 THEN AGEGP1="45-64";
ELSE IF AGE ge 65 THEN AGEGP1="65+";
ELSE AGEGP1=" ";
Next, I run the following Proc Freq:
proc freq data=deaths_&year. noprint;
tables RACEGP*AGEGP1*SEXGP/out=TEST_totall;
run;
I know I wouldn't see any deaths for example in "Alzheimers Disease" in the younger age groups(0-17, 18-29, 30-44), therefore they wouldn't show in the output dataset. However, I want to keep all of the age groups in the output dataset and assign 0 to the age groups that don't have any observations for a given cause of death.
My final output dataset should display all 5 of the age groups. Can anyone please advise or suggest how to code for this? Thanks.
Here's an example of how to do this using SASHELP.CLASS.
proc format;
value agef 0-10='0-10' 11-12='11-12' 13-14='13-14' 15-high='>15';
run;
data class;
set sashelp.class;
format age agef.;
run;
proc summary data=class completetypes nway;
class age/preloadfmt;
format age agef.;
var height;
output out=n n=n;
run;
Tried the Sparse and it didn't keep all the age groups. I will lookup the PRELOADFMT and see how that works.
By creating a new variable named AGEGRP1, PROC FREQ (even with the SPARSE option) only knows about the levels in the actual data set, and if a level doesn't appear in the data set, then PROC FREQ can't find it and put it in the table.
Instead of creating a new variable named AGEGRP1, you can create a custom format and then apply this custom format to variable AGE. Then PROC FREQ, with the SPARSE option, should give the desired output.
I don't think PROC FREQ has COMPLETETYPES or PRELOADFMT options; however PROC SUMMARY does have these and could also work here if the proper format is applied.
If you have already completed the PROC FREQ, you could build upon it to create the complete set of categories. For example:
proc sql;
create table races as select distinct racegp from test_totall;
create table ages as select distinct agegp1 from test_totall;
create table genders as select distinct sexgp from test_totall;
create table complete_shell as select * from races, ages, genders;
quit;
This gives you a complete set of all combinations of your variables, with no counts filled in. It could be sorted and merged with TEST_totall and the observations with a missing COUNT could be set to 0 at that point.
Also note, the AGEGP1 might be missing a character from the end. The first value you set is "0-17" which is four characters long. That means all the values will be four characters long, missing the fifth character. So before all the logic you posted, you should add:
length agegp1 $ 5;
Hi, @Reeza @PaigeMiller
I've actually tried all of the suggestions including yours @Astounding and I'm still not getting the desired output. For example, I'm selecting for cervical cancer deaths for Chicago by the 5 age groups. In my dataset, the age groups '0-17' and '18-29' are missing because there are no cervical cancer deaths in these age groups. I basically need to figured how to keep/include the age groups even though they don't exist.
Here's an example of how to do this using SASHELP.CLASS.
proc format;
value agef 0-10='0-10' 11-12='11-12' 13-14='13-14' 15-high='>15';
run;
data class;
set sashelp.class;
format age agef.;
run;
proc summary data=class completetypes nway;
class age/preloadfmt;
format age agef.;
var height;
output out=n n=n;
run;
Hi,
Thanks for the code. I was able to keep all of the age groups. @Astounding Here is what the code looks like:
PROC FORMAT;
VALUE $AGEGPP
' 0' - ' 17' = '0-17'
' 18'-' 29' = '18-29'
' 30'-' 44' = '30-44'
' 45'-' 64' = '45-64'
OTHER = '65+';
proc summary data=deaths_&year. completetypes nway;
class racegp AGEGP1 sexgp/preloadfmt;
format AGEGP1 $AGEGPP.;
output out=TEST_totall;* n=n;
run;
You can create a complete set of cells by going back to your original data and using that as the input to PROC SQL. So that would not depend on any medical conditions. The only way a category would be omitted is if it does not occur at all in the data, for any medical conditions. As long as it occurs at least once for any medical condition, it would be included.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.