BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RandoDando
Pyrite | Level 9

I have a table with office locations, with each office having separate locations.  Each of these locations should have a county associated with them, but not all do.  How can I fill in/impute those missing counties for the locations based on the county of the other locations in the same office?  For those missing, I want to fill in the associated county info as well.  However,  if an office is associated with more than 1 county in the original data, then I want to leave the missing values in that office as be.  

 

For instance

 

Office Location County County_Name County_Population
A 1 25 Harris 500000
A 2 25 Harris 500000
A 3 .   .
B 1 19 Bradley 200000
B 2 20 Orange 700000
B 3 .   .

 

I want to fill in the values for Office A, but NOT Office B due to the ambiguity.

 

I strongly prefer a method which is easily reproducible when the base data is updated.  No manual input.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Another way:

proc sql;
  select a.OFFICE
       , a.LOCATION
       , coalesce(a.COUNTY,b.COUNTY) as COUNTY
       , coalesce(a.COUNTY_NAME,b.COUNTY_NAME) as COUNTY_NAME
       , coalesce(a.COUNTY_POPULATION,b.COUNTY_POPULATION) as COUNTY_POPULATION
  from HAVE a
    left join
      ( select unique OFFICE, COUNTY, COUNTY_NAME, COUNTY_POPULATION 
        from HAVE 
        where COUNTY 
        group by OFFICE 
        having count(unique COUNTY)=1 ) b
    on a.OFFICE=b.OFFICE;

 

View solution in original post

4 REPLIES 4
Shmuel
Garnet | Level 18

Try next, not tested, code:

proc sort data=have (where=(county ne .))
          out=tmp1 nodupkey;
  by office county;
run;

data tmp2;
 set tmp1;
  by office;
     if first.office and last.office; /* only one county */
	 rename county = tofill;
run;

data want;
 merge have(in=inh)
       tmp2 (in=int);
	by office;
	if inh and int 
and county = . then county = tofill; drop tofill; run;
Kurt_Bremser
Super User

You can do it in one data step with a double do loop:

data have;
infile datalines dlm=" " dsd truncover;
input Office $ Location County County_Name :$20. County_Population;
datalines;
A 1 25 Harris 500000
A 2 25 Harris 500000
A 3 .  .
B 1 19 Bradley 200000
B 2 20 Orange 700000
B 3 .  .
;

data want;
length _county_name $20;
do until (last.office);
  set have;
  by office;
  if first.office then _flag = 1;
  if _flag and _county_name = " " and county_name ne " "
  then do;
    _county_name = county_name;
    _county = county;
  end;
  if county_name ne "" and county_name ne _county_name then _flag = 0;
end;
do until (last.office);
  set have;
  by office;
  if _flag and county_name = " "
  then do;
    county_name = _county_name;
    county = _county;
  end;
  output;
end;
drop _:;
run;

proc print data=want noobs;
run;

Result:

                                County_      County_
Office    Location    County     Name      Population
  A           1         25      Harris       500000  
  A           2         25      Harris       500000  
  A           3         25      Harris            .  
  B           1         19      Bradley      200000  
  B           2         20      Orange       700000  
  B           3          .                        .  
ChrisNZ
Tourmaline | Level 20

Another way:

proc sql;
  select a.OFFICE
       , a.LOCATION
       , coalesce(a.COUNTY,b.COUNTY) as COUNTY
       , coalesce(a.COUNTY_NAME,b.COUNTY_NAME) as COUNTY_NAME
       , coalesce(a.COUNTY_POPULATION,b.COUNTY_POPULATION) as COUNTY_POPULATION
  from HAVE a
    left join
      ( select unique OFFICE, COUNTY, COUNTY_NAME, COUNTY_POPULATION 
        from HAVE 
        where COUNTY 
        group by OFFICE 
        having count(unique COUNTY)=1 ) b
    on a.OFFICE=b.OFFICE;

 

Ksharp
Super User
data have;
infile datalines dlm=" " dsd truncover;
input Office $ Location County County_Name :$20. County_Population;
datalines;
A 1 25 Harris 500000
A 2 25 Harris 500000
A 3 .  .
B 1 19 Bradley 200000
B 2 20 Orange 700000
B 3 .  .
;

proc sql;
create table want as
select *,
 case when count(distinct county_name)=1 then max(county)
  else county end as new_county,
 case when count(distinct county_name)=1 then max(county_name)
  else county_name end as new_county_name
 from have
  group by office
   order by 1,2;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 631 views
  • 0 likes
  • 5 in conversation