I have a table with office locations, with each office having separate locations. Each of these locations should have a county associated with them, but not all do. How can I fill in/impute those missing counties for the locations based on the county of the other locations in the same office? For those missing, I want to fill in the associated county info as well. However, if an office is associated with more than 1 county in the original data, then I want to leave the missing values in that office as be.
For instance
Office | Location | County | County_Name | County_Population |
A | 1 | 25 | Harris | 500000 |
A | 2 | 25 | Harris | 500000 |
A | 3 | . | . | |
B | 1 | 19 | Bradley | 200000 |
B | 2 | 20 | Orange | 700000 |
B | 3 | . | . |
I want to fill in the values for Office A, but NOT Office B due to the ambiguity.
I strongly prefer a method which is easily reproducible when the base data is updated. No manual input.
Another way:
proc sql;
select a.OFFICE
, a.LOCATION
, coalesce(a.COUNTY,b.COUNTY) as COUNTY
, coalesce(a.COUNTY_NAME,b.COUNTY_NAME) as COUNTY_NAME
, coalesce(a.COUNTY_POPULATION,b.COUNTY_POPULATION) as COUNTY_POPULATION
from HAVE a
left join
( select unique OFFICE, COUNTY, COUNTY_NAME, COUNTY_POPULATION
from HAVE
where COUNTY
group by OFFICE
having count(unique COUNTY)=1 ) b
on a.OFFICE=b.OFFICE;
Try next, not tested, code:
proc sort data=have (where=(county ne .))
out=tmp1 nodupkey;
by office county;
run;
data tmp2;
set tmp1;
by office;
if first.office and last.office; /* only one county */
rename county = tofill;
run;
data want;
merge have(in=inh)
tmp2 (in=int);
by office;
if inh and int
and county = . then county = tofill;
drop tofill;
run;
You can do it in one data step with a double do loop:
data have;
infile datalines dlm=" " dsd truncover;
input Office $ Location County County_Name :$20. County_Population;
datalines;
A 1 25 Harris 500000
A 2 25 Harris 500000
A 3 . .
B 1 19 Bradley 200000
B 2 20 Orange 700000
B 3 . .
;
data want;
length _county_name $20;
do until (last.office);
set have;
by office;
if first.office then _flag = 1;
if _flag and _county_name = " " and county_name ne " "
then do;
_county_name = county_name;
_county = county;
end;
if county_name ne "" and county_name ne _county_name then _flag = 0;
end;
do until (last.office);
set have;
by office;
if _flag and county_name = " "
then do;
county_name = _county_name;
county = _county;
end;
output;
end;
drop _:;
run;
proc print data=want noobs;
run;
Result:
County_ County_ Office Location County Name Population A 1 25 Harris 500000 A 2 25 Harris 500000 A 3 25 Harris . B 1 19 Bradley 200000 B 2 20 Orange 700000 B 3 . .
Another way:
proc sql;
select a.OFFICE
, a.LOCATION
, coalesce(a.COUNTY,b.COUNTY) as COUNTY
, coalesce(a.COUNTY_NAME,b.COUNTY_NAME) as COUNTY_NAME
, coalesce(a.COUNTY_POPULATION,b.COUNTY_POPULATION) as COUNTY_POPULATION
from HAVE a
left join
( select unique OFFICE, COUNTY, COUNTY_NAME, COUNTY_POPULATION
from HAVE
where COUNTY
group by OFFICE
having count(unique COUNTY)=1 ) b
on a.OFFICE=b.OFFICE;
data have; infile datalines dlm=" " dsd truncover; input Office $ Location County County_Name :$20. County_Population; datalines; A 1 25 Harris 500000 A 2 25 Harris 500000 A 3 . . B 1 19 Bradley 200000 B 2 20 Orange 700000 B 3 . . ; proc sql; create table want as select *, case when count(distinct county_name)=1 then max(county) else county end as new_county, case when count(distinct county_name)=1 then max(county_name) else county_name end as new_county_name from have group by office order by 1,2; quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.