Solved: Keeping all variables and put zero when variables are blank

Chgo5 · Posted 10-29-2021 04:31 PM

Hi,

I have a SAS program that is reading in death(mortality) data by age and I created the following code for five age groups:

IF AGE GE 0 AND AGE lt 17 THEN AGEGP1="0-17";
ELSE IF 18<=AGE<=29 THEN AGEGP1="18-29";
ELSE IF 30<=AGE<=44 THEN AGEGP1="30-44";
ELSE IF 45<=AGE<=64 THEN AGEGP1="45-64";
ELSE IF AGE ge 65 THEN AGEGP1="65+";

ELSE AGEGP1=" ";

Next, I run the following Proc Freq:

proc freq data=deaths_&year. noprint;
tables RACEGP*AGEGP1*SEXGP/out=TEST_totall;
run;

I know I wouldn't see any deaths for example in "Alzheimers Disease" in the younger age groups(0-17, 18-29, 30-44), therefore they wouldn't show in the output dataset. However, I want to keep all of the age groups in the output dataset and assign 0 to the age groups that don't have any observations for a given cause of death.

My final output dataset should display all 5 of the age groups. Can anyone please advise or suggest how to code for this? Thanks.

PaigeMiller · Posted 11-01-2021 05:15 PM

Here's an example of how to do this using SASHELP.CLASS.

proc format;
    value agef 0-10='0-10' 11-12='11-12' 13-14='13-14' 15-high='>15';
run;
data class;
    set sashelp.class;
    format age agef.;
run;
proc summary data=class completetypes nway;
    class age/preloadfmt;
    format age agef.;
    var height;
    output out=n n=n;
run;

--
Paige Miller

View solution in original post

Reeza · Posted 10-29-2021 04:40 PM

If all categories are in your data somehow you can use the SPARSE option which builds 'full' tables.

tables RACEGP*AGEGP1*SEXGP/out=TEST_totall sparse;

If you have more complex situation then you need to use a PRELOADFMT, I'll let you google that one. Tons of examples on here.

Chgo5 · Posted 10-29-2021 04:47 PM

Tried the Sparse and it didn't keep all the age groups. I will lookup the PRELOADFMT and see how that works.

Reeza · Posted 10-29-2021 04:52 PM

COMPLETETYPES is the other option.
FYI - if sparse didn't work that means you'll have a few age groups that have 0 throughout your data for all diseases/categories.

PaigeMiller · Posted 10-29-2021 05:07 PM

By creating a new variable named AGEGRP1, PROC FREQ (even with the SPARSE option) only knows about the levels in the actual data set, and if a level doesn't appear in the data set, then PROC FREQ can't find it and put it in the table.

Instead of creating a new variable named AGEGRP1, you can create a custom format and then apply this custom format to variable AGE. Then PROC FREQ, with the SPARSE option, should give the desired output.

I don't think PROC FREQ has COMPLETETYPES or PRELOADFMT options; however PROC SUMMARY does have these and could also work here if the proper format is applied.

--
Paige Miller

Reeza · Posted 10-29-2021 05:13 PM

Added benefit of using a format, your age categories usually sort correctly by default then which becomes important later on 🙂

Astounding · Posted 10-30-2021 07:10 AM

If you have already completed the PROC FREQ, you could build upon it to create the complete set of categories. For example:

proc sql;
   create table races as select distinct racegp from test_totall;
   create table ages as select distinct agegp1 from test_totall;
   create table genders as select distinct sexgp from test_totall;
   create table complete_shell as select * from races, ages, genders;
quit;

This gives you a complete set of all combinations of your variables, with no counts filled in. It could be sorted and merged with TEST_totall and the observations with a missing COUNT could be set to 0 at that point.

Also note, the AGEGP1 might be missing a character from the end. The first value you set is "0-17" which is four characters long. That means all the values will be four characters long, missing the fifth character. So before all the logic you posted, you should add:

length agegp1 $ 5;

Chgo5 · Posted 11-01-2021 04:44 PM

Hi, @Reeza @PaigeMiller

I've actually tried all of the suggestions including yours @Astounding and I'm still not getting the desired output. For example, I'm selecting for cervical cancer deaths for Chicago by the 5 age groups. In my dataset, the age groups '0-17' and '18-29' are missing because there are no cervical cancer deaths in these age groups. I basically need to figured how to keep/include the age groups even though they don't exist.

PaigeMiller · Posted 11-01-2021 05:15 PM

Here's an example of how to do this using SASHELP.CLASS.

proc format;
    value agef 0-10='0-10' 11-12='11-12' 13-14='13-14' 15-high='>15';
run;
data class;
    set sashelp.class;
    format age agef.;
run;
proc summary data=class completetypes nway;
    class age/preloadfmt;
    format age agef.;
    var height;
    output out=n n=n;
run;

--
Paige Miller

Chgo5 · Posted 11-02-2021 01:41 PM

Hi,

Thanks for the code. I was able to keep all of the age groups. @Astounding Here is what the code looks like:

PROC FORMAT;

VALUE $AGEGPP
' 0' - ' 17' = '0-17'
' 18'-' 29' = '18-29'
' 30'-' 44' = '30-44'
' 45'-' 64' = '45-64'
OTHER = '65+';

proc summary data=deaths_&year. completetypes nway;
class racegp AGEGP1 sexgp/preloadfmt;
format AGEGP1 $AGEGPP.;
output out=TEST_totall;* n=n;
run;

Reeza · Posted 11-01-2021 06:03 PM

Show what you've tried with either CLASSDATA or PRELOADFMT please.

Astounding · Posted 11-02-2021 10:39 AM

You can create a complete set of cells by going back to your original data and using that as the input to PROC SQL. So that would not depend on any medical conditions. The only way a category would be omitted is if it does not occur at all in the data, for any medical conditions. As long as it occurs at least once for any medical condition, it would be included.

Keeping all variables and put zero when variables are blank

Re: Keeping all variables and put zero when variables are blank

Re: Keeping all variables and put zero when variables are blank

Re: Keeping all variables and put zero when variables are blank

Re: Keeping all variables and put zero when variables are blank

Re: Keeping all variables and put zero when variables are blank

Re: Keeping all variables and put zero when variables are blank

Re: Keeping all variables and put zero when variables are blank

Re: Keeping all variables and put zero when variables are blank

Re: Keeping all variables and put zero when variables are blank

Re: Keeping all variables and put zero when variables are blank

Re: Keeping all variables and put zero when variables are blank

Re: Keeping all variables and put zero when variables are blank

The 2025 SAS Hackathon Kicks Off on June 11!

SAS Training: Just a Click Away