I have two datasets that I am merging - one with population data and one with case count data. The population dataset has data for all counties by gender and age group. The case dataset has cases for counties by gender and age group only where there were cases (does not include all states (or age/gender stratifications in population dataset). When I combine these datasets (by state, county, gender, and age), cases is set to missing anywhere there were no cases. This is fine for states where no case data were available (I'd want to delete those where all cases missing), but I want to change those missing values to 0s only for states where case data are available.
Example of merged dataset:
State County Gender Age_group Pop Cases
01 01111 M 1 1500 .
01 01111 M 2 2000 .
01 01111 M 3 2300 .
01 01111 Through M ages 1-18, F ages 1-18 .
08 08244 M 14 1498 28
08 08244 M 15 1247 21
08 08244 M 16 2736 .
08 08244 M 17 1932 .
08 08244 M 18 1140 11
So for the example above - all cases under state 01 were set to missing because the population dataset had data, but the case dataset had no data for that state. If I could somehow delete observations where all values for case are missing for all values within a state, that would be good (or have them so they don't merge). Then I would want to insert 0s where a state has data, but values are missing (e.g., above I'd want 0s for state 08 where they are missing as I know the state has data, so 0 cases for those stratifications).
I'd want to end up with:
State County Gender Age_group Pop Cases
08 08244 M 14 1498 28
08 08244 M 15 1247 21
08 08244 M 16 2736 0
08 08244 M 17 1932 0
08 08244 M 18 1140 11
Not sure if my data step with merge or proc sql code can be modified to include the above, but here is my code for merging (data step returns missing values as shown in example 1 above, proc sql just returns non-missing obs, but that's not what I want as I want to set missing to 0 where a state has case data):
data casespop;
merge final.pop final.case (drop=YEAR);
by State County Gender Age_group;
run;
proc sql;
create table merged_casepop as
select case.State, case.County, case.Gender, pop.county, pop.Age_group, cases, pop
from final.case, final.pop
where case.State eq pop.State AND case.County eq pop.County AND case.Age_group eq pop.Age_group
AND case.Gender eq pop.Gender;
quit;
Thanks!
Without commenting about earlier solutions, this approach would work. Use your final merged data set:
data want;
total_cases=0;
do until (last.state);
set merged_casepop;
by state;
total_cases + cases;
end;
do until (last.state);
set merged_casepop;
by state;
if total_cases then do;
if cases = . then cases = 0;
output;
end;
end;
drop total_cases;
run;
If I correctly understand what you're trying to do, the first part should be easy to control using the in= option. e.g.:
data casespop (drop=in_); merge final.pop (in=in_pop) final.case (in=in_case drop=YEAR); by State County Gender Age_group; if in_case and in_pop; run;
That would delete those records that weren't in both files. Then, setting missing to zero should be easy in a subsequent datastep or incorporating it into your sql case statement.
Art, CEO, AnalystFinder.com
Thanks, Art! @art297 I tried that and that seems to be the same thing that I get with the proc sql code (creates dataset without records not in both files). That doesn't give me the missing values I need for states that *do* have data, so it created a dataset without any missing values for cases, meaning I can't set them to 0. I tried @kiranv_'s suggestion, but couldn't seem to get that to work (I'm not a SAS pro!). @Astounding's suggestion is what I needed - using my already merged dataset to get 0s for states that do have data and get rid of states with all 0s (meaning they had no data). Thanks for the help everyone!
below query should work.
proc sql; create table want as select a.id, case when a.val = . then 0 else a.val end as val, b.val2 from (select id, val from table1 a) inner join (select id, val2 from table2 b group by id having sum(val) ne .)/* this will not pick records which are not missing for complete id*/ on a.id =b.id;/* you can extend this values*/ quit;
Without commenting about earlier solutions, this approach would work. Use your final merged data set:
data want;
total_cases=0;
do until (last.state);
set merged_casepop;
by state;
total_cases + cases;
end;
do until (last.state);
set merged_casepop;
by state;
if total_cases then do;
if cases = . then cases = 0;
output;
end;
end;
drop total_cases;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.