Solved: Filling in blank values

RandoDando · Posted 06-23-2020 02:56 PM

I have a table with office locations, with each office having separate locations. Each of these locations should have a county associated with them, but not all do. How can I fill in/impute those missing counties for the locations based on the county of the other locations in the same office? For those missing, I want to fill in the associated county info as well. However, if an office is associated with more than 1 county in the original data, then I want to leave the missing values in that office as be.

For instance

Office	Location	County	County_Name	County_Population
A	1	25	Harris	500000
A	2	25	Harris	500000
A	3	.		.
B	1	19	Bradley	200000
B	2	20	Orange	700000
B	3	.		.

I want to fill in the values for Office A, but NOT Office B due to the ambiguity.

I strongly prefer a method which is easily reproducible when the base data is updated. No manual input.

ChrisNZ · Posted 06-24-2020 04:58 AM

Another way:

proc sql;
  select a.OFFICE
       , a.LOCATION
       , coalesce(a.COUNTY,b.COUNTY) as COUNTY
       , coalesce(a.COUNTY_NAME,b.COUNTY_NAME) as COUNTY_NAME
       , coalesce(a.COUNTY_POPULATION,b.COUNTY_POPULATION) as COUNTY_POPULATION
  from HAVE a
    left join
      ( select unique OFFICE, COUNTY, COUNTY_NAME, COUNTY_POPULATION 
        from HAVE 
        where COUNTY 
        group by OFFICE 
        having count(unique COUNTY)=1 ) b
    on a.OFFICE=b.OFFICE;

High-Performance SAS Coding - Third Edition

View solution in original post

Shmuel · Posted 06-23-2020 03:19 PM

Try next, not tested, code:

proc sort data=have (where=(county ne .))
          out=tmp1 nodupkey;
  by office county;
run;

data tmp2;
 set tmp1;
  by office;
     if first.office and last.office; /* only one county */
	 rename county = tofill;
run;

data want;
 merge have(in=inh)
       tmp2 (in=int);
	by office;
	if inh and int 
     and county = . then county = tofill;
	drop tofill;
run;

Kurt_Bremser · Posted 06-24-2020 04:28 AM

You can do it in one data step with a double do loop:

data have;
infile datalines dlm=" " dsd truncover;
input Office $ Location County County_Name :$20. County_Population;
datalines;
A 1 25 Harris 500000
A 2 25 Harris 500000
A 3 .  .
B 1 19 Bradley 200000
B 2 20 Orange 700000
B 3 .  .
;

data want;
length _county_name $20;
do until (last.office);
  set have;
  by office;
  if first.office then _flag = 1;
  if _flag and _county_name = " " and county_name ne " "
  then do;
    _county_name = county_name;
    _county = county;
  end;
  if county_name ne "" and county_name ne _county_name then _flag = 0;
end;
do until (last.office);
  set have;
  by office;
  if _flag and county_name = " "
  then do;
    county_name = _county_name;
    county = _county;
  end;
  output;
end;
drop _:;
run;

proc print data=want noobs;
run;

Result:

                                County_      County_
Office    Location    County     Name      Population
  A           1         25      Harris       500000  
  A           2         25      Harris       500000  
  A           3         25      Harris            .  
  B           1         19      Bradley      200000  
  B           2         20      Orange       700000  
  B           3          .                        .

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ChrisNZ · Posted 06-24-2020 04:58 AM

Another way:

proc sql;
  select a.OFFICE
       , a.LOCATION
       , coalesce(a.COUNTY,b.COUNTY) as COUNTY
       , coalesce(a.COUNTY_NAME,b.COUNTY_NAME) as COUNTY_NAME
       , coalesce(a.COUNTY_POPULATION,b.COUNTY_POPULATION) as COUNTY_POPULATION
  from HAVE a
    left join
      ( select unique OFFICE, COUNTY, COUNTY_NAME, COUNTY_POPULATION 
        from HAVE 
        where COUNTY 
        group by OFFICE 
        having count(unique COUNTY)=1 ) b
    on a.OFFICE=b.OFFICE;

High-Performance SAS Coding - Third Edition

Ksharp · Posted 06-24-2020 07:41 AM

data have;
infile datalines dlm=" " dsd truncover;
input Office $ Location County County_Name :$20. County_Population;
datalines;
A 1 25 Harris 500000
A 2 25 Harris 500000
A 3 .  .
B 1 19 Bradley 200000
B 2 20 Orange 700000
B 3 .  .
;

proc sql;
create table want as
select *,
 case when count(distinct county_name)=1 then max(county)
  else county end as new_county,
 case when count(distinct county_name)=1 then max(county_name)
  else county_name end as new_county_name
 from have
  group by office
   order by 1,2;
quit;

Filling in blank values

Re: Filling in blank values

Re: Filling in blank values

Re: Filling in blank values

Re: Filling in blank values

Re: Filling in blank values

Filling in blank values

Re: Filling in blank values

Re: Filling in blank values

Re: Filling in blank values

Re: Filling in blank values

Re: Filling in blank values

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!