BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RandoDando
Pyrite | Level 9

I have a table with office locations, with each office having separate locations.  Each of these locations should have a county associated with them, but not all do.  How can I fill in/impute those missing counties for the locations based on the county of the other locations in the same office?  For those missing, I want to fill in the associated county info as well.  However,  if an office is associated with more than 1 county in the original data, then I want to leave the missing values in that office as be.  

 

For instance

 

Office Location County County_Name County_Population
A 1 25 Harris 500000
A 2 25 Harris 500000
A 3 .   .
B 1 19 Bradley 200000
B 2 20 Orange 700000
B 3 .   .

 

I want to fill in the values for Office A, but NOT Office B due to the ambiguity.

 

I strongly prefer a method which is easily reproducible when the base data is updated.  No manual input.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Another way:

proc sql;
  select a.OFFICE
       , a.LOCATION
       , coalesce(a.COUNTY,b.COUNTY) as COUNTY
       , coalesce(a.COUNTY_NAME,b.COUNTY_NAME) as COUNTY_NAME
       , coalesce(a.COUNTY_POPULATION,b.COUNTY_POPULATION) as COUNTY_POPULATION
  from HAVE a
    left join
      ( select unique OFFICE, COUNTY, COUNTY_NAME, COUNTY_POPULATION 
        from HAVE 
        where COUNTY 
        group by OFFICE 
        having count(unique COUNTY)=1 ) b
    on a.OFFICE=b.OFFICE;

 

View solution in original post

4 REPLIES 4
Shmuel
Garnet | Level 18

Try next, not tested, code:

proc sort data=have (where=(county ne .))
          out=tmp1 nodupkey;
  by office county;
run;

data tmp2;
 set tmp1;
  by office;
     if first.office and last.office; /* only one county */
	 rename county = tofill;
run;

data want;
 merge have(in=inh)
       tmp2 (in=int);
	by office;
	if inh and int 
and county = . then county = tofill; drop tofill; run;
Kurt_Bremser
Super User

You can do it in one data step with a double do loop:

data have;
infile datalines dlm=" " dsd truncover;
input Office $ Location County County_Name :$20. County_Population;
datalines;
A 1 25 Harris 500000
A 2 25 Harris 500000
A 3 .  .
B 1 19 Bradley 200000
B 2 20 Orange 700000
B 3 .  .
;

data want;
length _county_name $20;
do until (last.office);
  set have;
  by office;
  if first.office then _flag = 1;
  if _flag and _county_name = " " and county_name ne " "
  then do;
    _county_name = county_name;
    _county = county;
  end;
  if county_name ne "" and county_name ne _county_name then _flag = 0;
end;
do until (last.office);
  set have;
  by office;
  if _flag and county_name = " "
  then do;
    county_name = _county_name;
    county = _county;
  end;
  output;
end;
drop _:;
run;

proc print data=want noobs;
run;

Result:

                                County_      County_
Office    Location    County     Name      Population
  A           1         25      Harris       500000  
  A           2         25      Harris       500000  
  A           3         25      Harris            .  
  B           1         19      Bradley      200000  
  B           2         20      Orange       700000  
  B           3          .                        .  
ChrisNZ
Tourmaline | Level 20

Another way:

proc sql;
  select a.OFFICE
       , a.LOCATION
       , coalesce(a.COUNTY,b.COUNTY) as COUNTY
       , coalesce(a.COUNTY_NAME,b.COUNTY_NAME) as COUNTY_NAME
       , coalesce(a.COUNTY_POPULATION,b.COUNTY_POPULATION) as COUNTY_POPULATION
  from HAVE a
    left join
      ( select unique OFFICE, COUNTY, COUNTY_NAME, COUNTY_POPULATION 
        from HAVE 
        where COUNTY 
        group by OFFICE 
        having count(unique COUNTY)=1 ) b
    on a.OFFICE=b.OFFICE;

 

Ksharp
Super User
data have;
infile datalines dlm=" " dsd truncover;
input Office $ Location County County_Name :$20. County_Population;
datalines;
A 1 25 Harris 500000
A 2 25 Harris 500000
A 3 .  .
B 1 19 Bradley 200000
B 2 20 Orange 700000
B 3 .  .
;

proc sql;
create table want as
select *,
 case when count(distinct county_name)=1 then max(county)
  else county end as new_county,
 case when count(distinct county_name)=1 then max(county_name)
  else county_name end as new_county_name
 from have
  group by office
   order by 1,2;
quit;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1143 views
  • 0 likes
  • 5 in conversation