BookmarkSubscribeRSS Feed
Greg95
Calcite | Level 5

Hello everyone,

The question I have is I am cleaning data for use with a predictive model and one of the tables I am looking at has to do with disease cases. I have gender, disease, year, age group, population as variables. 

The problem I am running into is : for example I have osteoporosis disease with 20,000 cases for males and 20,000 cases for females. For the "all" gender there is missing value, how can I make this calculation? 

6 REPLIES 6
antonbcristina
SAS Employee

From the sample you showed, the only time that the All category has missing cases is when there are missing cases for both Females and Males. Is this always the case?

Greg95
Calcite | Level 5

Yes, there is only a missing value in all if there is a value for both male and female.

If there is a value missing for females then there is a value for male and all for that age group/disease category.

antonbcristina
SAS Employee

I think you meant to say:

 

Yes, there is only a missing value in all if there is a missing value for both male and female.

 

In this case, there is nothing that you can do to fix this. There is no way to calculate the All category without individual cases for either Female or Male.

Greg95
Calcite | Level 5

For each missing "All" value, the male and female values are there. 

For each missing "Female" values, the all and male values are there.

For each missing "Male" value, the all and female values are there.

antonbcristina
SAS Employee

This doesn't seem to be reflected in your sample. Is your data different?

antonbcristina
SAS Employee

I've recreated a small sample of your sample here:

 

data diseases;
   infile datalines dlm=' ' dsd;
   input year age $ gender $ disease :$50. cases pop;
datalines;
1990 0-17 All Asthma 182000 64177000
1990 0-17 All "Dementia and Alzheimer's disease" . 64177000
1990 0-17 Female Asthma 71000 31295000
1990 0-17 Female "Dementia and Alzheimer's disease" 22000 31295000
1990 0-17 Male Asthma 111000 32883000
1990 0-17 Male "Dementia and Alzheimer's disease" . 32883000
run;

 

So if you do have a scenario as you mentioned, the easiest way to validate your data is to transpose it and then check that all = sum of females and males, as below. If it isn't, then make it so.

 

proc sort data=diseases;
   by year age disease;
run;

proc transpose data=diseases out=diseases_t;
   by year age disease;
   var cases pop;
   id gender;
run;

data diseases_validated;
   set diseases_t;
   valid = (all=sum(female,male));
run;

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 815 views
  • 0 likes
  • 2 in conversation