SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Set missing to zero only for certain observations?

Accepted Solution Solved
Reply
Contributor
Posts: 34
Accepted Solution

Set missing to zero only for certain observations?

[ Edited ]

I have two datasets that I am merging - one with population data and one with case count data. The population dataset has data for all counties by gender and age group. The case dataset has cases for counties by gender and age group only where there were cases (does not include all states (or age/gender stratifications in population dataset). When I combine these datasets (by state, county, gender, and age), cases is set to missing anywhere there were no cases. This is fine for states where no case data were available (I'd want to delete those where all cases missing), but I want to change those missing values to 0s only for states where case data are available.

 

Example of merged dataset:

 

State          County             Gender             Age_group       Pop          Cases

01               01111                  M                     1                     1500            .

01               01111                  M                     2                      2000           .

01               01111                  M                     3                      2300           .

01               01111          Through M ages 1-18, F ages 1-18                   .

08               08244                 M                     14                    1498           28

08               08244                 M                     15                    1247           21

08               08244                 M                     16                     2736           .

08               08244                 M                     17                    1932            .

08               08244                 M                     18                     1140          11

 

So for the example above - all cases under state 01 were set to missing because the population dataset had data, but the case dataset had no data for that state. If I could somehow delete observations where all values for case are missing for all values within a state, that would be good (or have them so they don't merge). Then I would want to insert 0s where a state has data, but values are missing (e.g., above I'd want 0s for state 08 where they are missing as I know the state has data, so 0 cases for those stratifications).

 

I'd want to end up with:

 

State          County             Gender             Age_group       Pop          Cases

08               08244                 M                     14                    1498          28

08               08244                 M                     15                    1247          21

08               08244                 M                     16                     2736          0

08               08244                 M                     17                    1932           0

08               08244                 M                     18                     1140         11

 

Not sure if my data step with merge or proc sql code can be modified to include the above, but here is my code for merging (data step returns missing values as shown in example 1 above, proc sql just returns non-missing obs, but that's not what I want as I want to set missing to 0 where a state has case data):

 

data casespop;
    merge final.pop final.case (drop=YEAR);
    by State County Gender Age_group;
run;

 

proc sql;
    create table merged_casepop as
    select     case.State, case.County, case.Gender, pop.county, pop.Age_group, cases, pop
    from   final.case, final.pop

    where    case.State eq pop.State AND case.County eq pop.County AND case.Age_group eq pop.Age_group
            AND case.Gender eq pop.Gender;
quit;

 

Thanks!


Accepted Solutions
Solution
‎06-06-2017 01:45 PM
Super User
Posts: 5,082

Re: Set missing to zero only for certain observations?

Without commenting about earlier solutions, this approach would work.  Use  your final merged data set:

 

data want;

total_cases=0;

do until (last.state);

   set merged_casepop;

   by state;

   total_cases + cases;

end;

do until (last.state);

   set merged_casepop;

   by state;

   if total_cases then do;

      if cases = . then cases = 0;

      output;

   end;

end;

drop total_cases;

run;

View solution in original post


All Replies
PROC Star
Posts: 7,363

Re: Set missing to zero only for certain observations?

If I correctly understand what you're trying to do, the first part should be easy to control using the in= option. e.g.:

data casespop (drop=in_);
    merge final.pop (in=in_pop) final.case (in=in_case drop=YEAR);
    by State County Gender Age_group;
    if in_case and in_pop;
run;

That would delete those records that weren't in both files. Then, setting missing to zero should be easy in a subsequent datastep or incorporating it into your sql case statement.

 

Art, CEO, AnalystFinder.com

 

Contributor
Posts: 34

Re: Set missing to zero only for certain observations?

Thanks, Art! @art297 I tried that and that seems to be the same thing that I get with the proc sql code (creates dataset without records not in both files). That doesn't give me the missing values I need for states that *do* have data, so it created a dataset without any missing values for cases, meaning I can't set them to 0. I tried @kiranv_'s suggestion, but couldn't seem to get that to work (I'm not a SAS pro!). @Astounding's suggestion is what I needed - using my already merged dataset to get 0s for states that do have data and get rid of states with all 0s (meaning they had no data). Thanks for the help everyone! Smiley Happy

PROC Star
Posts: 252

Re: Set missing to zero only for certain observations?

below query should work.

 

proc sql;
create table want as 
select a.id,
       case when a.val = . then 0 else a.val end as val,
        b.val2 
from 
(select id, val from table1 a)
inner join
(select id, val2
from table2 b
group by id
having sum(val) ne .)/* this will not pick records which are not missing for complete id*/
on a.id =b.id;/* you can extend this values*/
quit;
Solution
‎06-06-2017 01:45 PM
Super User
Posts: 5,082

Re: Set missing to zero only for certain observations?

Without commenting about earlier solutions, this approach would work.  Use  your final merged data set:

 

data want;

total_cases=0;

do until (last.state);

   set merged_casepop;

   by state;

   total_cases + cases;

end;

do until (last.state);

   set merged_casepop;

   by state;

   if total_cases then do;

      if cases = . then cases = 0;

      output;

   end;

end;

drop total_cases;

run;

Contributor
Posts: 34

Re: Set missing to zero only for certain observations?

Worked like a charm - thanks! I'll have to read up a bit more on do loops and understanding what statements like 'if total_cases then do' mean.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 158 views
  • 1 like
  • 4 in conversation