I have a table with office locations, with each office having separate locations. Each of these locations should have a county associated with them, but not all do. How can I fill in/impute those missing counties for the locations based on the county of the other locations in the same office? For those missing, I want to fill in the associated county info as well. However, if an office is associated with more than 1 county in the original data, then I want to leave the missing values in that office as be.
For instance
Office | Location | County | County_Name | County_Population |
A | 1 | 25 | Harris | 500000 |
A | 2 | 25 | Harris | 500000 |
A | 3 | . | . | |
B | 1 | 19 | Bradley | 200000 |
B | 2 | 20 | Orange | 700000 |
B | 3 | . | . |
I want to fill in the values for Office A, but NOT Office B due to the ambiguity.
I strongly prefer a method which is easily reproducible when the base data is updated. No manual input.
Another way:
proc sql;
select a.OFFICE
, a.LOCATION
, coalesce(a.COUNTY,b.COUNTY) as COUNTY
, coalesce(a.COUNTY_NAME,b.COUNTY_NAME) as COUNTY_NAME
, coalesce(a.COUNTY_POPULATION,b.COUNTY_POPULATION) as COUNTY_POPULATION
from HAVE a
left join
( select unique OFFICE, COUNTY, COUNTY_NAME, COUNTY_POPULATION
from HAVE
where COUNTY
group by OFFICE
having count(unique COUNTY)=1 ) b
on a.OFFICE=b.OFFICE;
Try next, not tested, code:
proc sort data=have (where=(county ne .))
out=tmp1 nodupkey;
by office county;
run;
data tmp2;
set tmp1;
by office;
if first.office and last.office; /* only one county */
rename county = tofill;
run;
data want;
merge have(in=inh)
tmp2 (in=int);
by office;
if inh and int
and county = . then county = tofill;
drop tofill;
run;
You can do it in one data step with a double do loop:
data have;
infile datalines dlm=" " dsd truncover;
input Office $ Location County County_Name :$20. County_Population;
datalines;
A 1 25 Harris 500000
A 2 25 Harris 500000
A 3 . .
B 1 19 Bradley 200000
B 2 20 Orange 700000
B 3 . .
;
data want;
length _county_name $20;
do until (last.office);
set have;
by office;
if first.office then _flag = 1;
if _flag and _county_name = " " and county_name ne " "
then do;
_county_name = county_name;
_county = county;
end;
if county_name ne "" and county_name ne _county_name then _flag = 0;
end;
do until (last.office);
set have;
by office;
if _flag and county_name = " "
then do;
county_name = _county_name;
county = _county;
end;
output;
end;
drop _:;
run;
proc print data=want noobs;
run;
Result:
County_ County_ Office Location County Name Population A 1 25 Harris 500000 A 2 25 Harris 500000 A 3 25 Harris . B 1 19 Bradley 200000 B 2 20 Orange 700000 B 3 . .
Another way:
proc sql;
select a.OFFICE
, a.LOCATION
, coalesce(a.COUNTY,b.COUNTY) as COUNTY
, coalesce(a.COUNTY_NAME,b.COUNTY_NAME) as COUNTY_NAME
, coalesce(a.COUNTY_POPULATION,b.COUNTY_POPULATION) as COUNTY_POPULATION
from HAVE a
left join
( select unique OFFICE, COUNTY, COUNTY_NAME, COUNTY_POPULATION
from HAVE
where COUNTY
group by OFFICE
having count(unique COUNTY)=1 ) b
on a.OFFICE=b.OFFICE;
data have; infile datalines dlm=" " dsd truncover; input Office $ Location County County_Name :$20. County_Population; datalines; A 1 25 Harris 500000 A 2 25 Harris 500000 A 3 . . B 1 19 Bradley 200000 B 2 20 Orange 700000 B 3 . . ; proc sql; create table want as select *, case when count(distinct county_name)=1 then max(county) else county end as new_county, case when count(distinct county_name)=1 then max(county_name) else county_name end as new_county_name from have group by office order by 1,2; quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.