BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wernie
Quartz | Level 8

I have two datasets that I am merging - one with population data and one with case count data. The population dataset has data for all counties by gender and age group. The case dataset has cases for counties by gender and age group only where there were cases (does not include all states (or age/gender stratifications in population dataset). When I combine these datasets (by state, county, gender, and age), cases is set to missing anywhere there were no cases. This is fine for states where no case data were available (I'd want to delete those where all cases missing), but I want to change those missing values to 0s only for states where case data are available.

 

Example of merged dataset:

 

State          County             Gender             Age_group       Pop          Cases

01               01111                  M                     1                     1500            .

01               01111                  M                     2                      2000           .

01               01111                  M                     3                      2300           .

01               01111          Through M ages 1-18, F ages 1-18                   .

08               08244                 M                     14                    1498           28

08               08244                 M                     15                    1247           21

08               08244                 M                     16                     2736           .

08               08244                 M                     17                    1932            .

08               08244                 M                     18                     1140          11

 

So for the example above - all cases under state 01 were set to missing because the population dataset had data, but the case dataset had no data for that state. If I could somehow delete observations where all values for case are missing for all values within a state, that would be good (or have them so they don't merge). Then I would want to insert 0s where a state has data, but values are missing (e.g., above I'd want 0s for state 08 where they are missing as I know the state has data, so 0 cases for those stratifications).

 

I'd want to end up with:

 

State          County             Gender             Age_group       Pop          Cases

08               08244                 M                     14                    1498          28

08               08244                 M                     15                    1247          21

08               08244                 M                     16                     2736          0

08               08244                 M                     17                    1932           0

08               08244                 M                     18                     1140         11

 

Not sure if my data step with merge or proc sql code can be modified to include the above, but here is my code for merging (data step returns missing values as shown in example 1 above, proc sql just returns non-missing obs, but that's not what I want as I want to set missing to 0 where a state has case data):

 

data casespop;
    merge final.pop final.case (drop=YEAR);
    by State County Gender Age_group;
run;

 

proc sql;
    create table merged_casepop as
    select     case.State, case.County, case.Gender, pop.county, pop.Age_group, cases, pop
    from   final.case, final.pop

    where    case.State eq pop.State AND case.County eq pop.County AND case.Age_group eq pop.Age_group
            AND case.Gender eq pop.Gender;
quit;

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Without commenting about earlier solutions, this approach would work.  Use  your final merged data set:

 

data want;

total_cases=0;

do until (last.state);

   set merged_casepop;

   by state;

   total_cases + cases;

end;

do until (last.state);

   set merged_casepop;

   by state;

   if total_cases then do;

      if cases = . then cases = 0;

      output;

   end;

end;

drop total_cases;

run;

View solution in original post

5 REPLIES 5
art297
Opal | Level 21

If I correctly understand what you're trying to do, the first part should be easy to control using the in= option. e.g.:

data casespop (drop=in_);
    merge final.pop (in=in_pop) final.case (in=in_case drop=YEAR);
    by State County Gender Age_group;
    if in_case and in_pop;
run;

That would delete those records that weren't in both files. Then, setting missing to zero should be easy in a subsequent datastep or incorporating it into your sql case statement.

 

Art, CEO, AnalystFinder.com

 

wernie
Quartz | Level 8

Thanks, Art! @art297 I tried that and that seems to be the same thing that I get with the proc sql code (creates dataset without records not in both files). That doesn't give me the missing values I need for states that *do* have data, so it created a dataset without any missing values for cases, meaning I can't set them to 0. I tried @kiranv_'s suggestion, but couldn't seem to get that to work (I'm not a SAS pro!). @Astounding's suggestion is what I needed - using my already merged dataset to get 0s for states that do have data and get rid of states with all 0s (meaning they had no data). Thanks for the help everyone! Smiley Happy

kiranv_
Rhodochrosite | Level 12

below query should work.

 

proc sql;
create table want as 
select a.id,
       case when a.val = . then 0 else a.val end as val,
        b.val2 
from 
(select id, val from table1 a)
inner join
(select id, val2
from table2 b
group by id
having sum(val) ne .)/* this will not pick records which are not missing for complete id*/
on a.id =b.id;/* you can extend this values*/
quit;
Astounding
PROC Star

Without commenting about earlier solutions, this approach would work.  Use  your final merged data set:

 

data want;

total_cases=0;

do until (last.state);

   set merged_casepop;

   by state;

   total_cases + cases;

end;

do until (last.state);

   set merged_casepop;

   by state;

   if total_cases then do;

      if cases = . then cases = 0;

      output;

   end;

end;

drop total_cases;

run;

wernie
Quartz | Level 8
Worked like a charm - thanks! I'll have to read up a bit more on do loops and understanding what statements like 'if total_cases then do' mean.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1808 views
  • 1 like
  • 4 in conversation