Solved: Set missing to zero only for certain observations?

wernie · Posted 06-06-2017 12:54 PM

I have two datasets that I am merging - one with population data and one with case count data. The population dataset has data for all counties by gender and age group. The case dataset has cases for counties by gender and age group only where there were cases (does not include all states (or age/gender stratifications in population dataset). When I combine these datasets (by state, county, gender, and age), cases is set to missing anywhere there were no cases. This is fine for states where no case data were available (I'd want to delete those where all cases missing), but I want to change those missing values to 0s only for states where case data are available.

Example of merged dataset:

State County Gender Age_group Pop Cases

01 01111 M 1 1500 .

01 01111 M 2 2000 .

01 01111 M 3 2300 .

01 01111 Through M ages 1-18, F ages 1-18 .

08 08244 M 14 1498 28

08 08244 M 15 1247 21

08 08244 M 16 2736 .

08 08244 M 17 1932 .

08 08244 M 18 1140 11

So for the example above - all cases under state 01 were set to missing because the population dataset had data, but the case dataset had no data for that state. If I could somehow delete observations where all values for case are missing for all values within a state, that would be good (or have them so they don't merge). Then I would want to insert 0s where a state has data, but values are missing (e.g., above I'd want 0s for state 08 where they are missing as I know the state has data, so 0 cases for those stratifications).

I'd want to end up with:

State County Gender Age_group Pop Cases

08 08244 M 14 1498 28

08 08244 M 15 1247 21

08 08244 M 16 2736 0

08 08244 M 17 1932 0

08 08244 M 18 1140 11

Not sure if my data step with merge or proc sql code can be modified to include the above, but here is my code for merging (data step returns missing values as shown in example 1 above, proc sql just returns non-missing obs, but that's not what I want as I want to set missing to 0 where a state has case data):

data casespop;
merge final.pop final.case (drop=YEAR);
by State County Gender Age_group;
run;

proc sql;
   create table merged_casepop as
   select     case.State, case.County, case.Gender, pop.county, pop.Age_group, cases, pop
   from   final.case, final.pop

where case.State eq pop.State AND case.County eq pop.County AND case.Age_group eq pop.Age_group
AND case.Gender eq pop.Gender;
quit;

Thanks!

Astounding · Posted 06-06-2017 01:29 PM

Without commenting about earlier solutions, this approach would work. Use your final merged data set:

data want;

total_cases=0;

do until (last.state);

set merged_casepop;

by state;

total_cases + cases;

end;

do until (last.state);

set merged_casepop;

by state;

if total_cases then do;

if cases = . then cases = 0;

output;

end;

drop total_cases;

run;

View solution in original post

art297 · Posted 06-06-2017 01:16 PM

If I correctly understand what you're trying to do, the first part should be easy to control using the in= option. e.g.:

data casespop (drop=in_);
    merge final.pop (in=in_pop) final.case (in=in_case drop=YEAR);
    by State County Gender Age_group;
    if in_case and in_pop;
run;

That would delete those records that weren't in both files. Then, setting missing to zero should be easy in a subsequent datastep or incorporating it into your sql case statement.

Art, CEO, AnalystFinder.com

wernie · Posted 06-06-2017 01:44 PM

Thanks, Art! @art297 I tried that and that seems to be the same thing that I get with the proc sql code (creates dataset without records not in both files). That doesn't give me the missing values I need for states that *do* have data, so it created a dataset without any missing values for cases, meaning I can't set them to 0. I tried @kiranv_'s suggestion, but couldn't seem to get that to work (I'm not a SAS pro!). @Astounding's suggestion is what I needed - using my already merged dataset to get 0s for states that do have data and get rid of states with all 0s (meaning they had no data). Thanks for the help everyone!

kiranv_ · Posted 06-06-2017 01:18 PM

below query should work.

proc sql;
create table want as 
select a.id,
       case when a.val = . then 0 else a.val end as val,
        b.val2 
from 
(select id, val from table1 a)
inner join
(select id, val2
from table2 b
group by id
having sum(val) ne .)/* this will not pick records which are not missing for complete id*/
on a.id =b.id;/* you can extend this values*/
quit;

Astounding · Posted 06-06-2017 01:29 PM

Without commenting about earlier solutions, this approach would work. Use your final merged data set:

data want;

total_cases=0;

do until (last.state);

set merged_casepop;

by state;

total_cases + cases;

end;

do until (last.state);

set merged_casepop;

by state;

if total_cases then do;

if cases = . then cases = 0;

output;

end;

drop total_cases;

run;

wernie · Posted 06-06-2017 01:45 PM

Worked like a charm - thanks! I'll have to read up a bit more on do loops and understanding what statements like 'if total_cases then do' mean.

Set missing to zero only for certain observations?

Re: Set missing to zero only for certain observations?

Re: Set missing to zero only for certain observations?

Re: Set missing to zero only for certain observations?

Re: Set missing to zero only for certain observations?

Re: Set missing to zero only for certain observations?

Re: Set missing to zero only for certain observations?

Set missing to zero only for certain observations?

Re: Set missing to zero only for certain observations?

Re: Set missing to zero only for certain observations?

Re: Set missing to zero only for certain observations?

Re: Set missing to zero only for certain observations?

Re: Set missing to zero only for certain observations?

Re: Set missing to zero only for certain observations?

Ready to join fellow brilliant minds for the SAS Hackathon?