Solved: Re: Filling missing values

dapenDaniel · Posted 10-25-2019 02:00 AM

Hi, I have a data set like below.

ID1 ID2

10001A

10001A AAA111

10001A

Now, I would like to fill the missing value. As long as the ID1 is the same, fill the missing value with the available value. The data set I want is below

ID1 ID2

10001A AAA111

Can anyone tell me what code I need to use? Thanks!

PeterClemmensen · Posted 10-25-2019 02:20 AM

I am aware. For simple data like this, you can do something like below. I added another group to ID1 to demonstrate that it handles multiple groups

data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A       
10001A       
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A       
10001A       
10001A       
10001B       
10001B       
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B       
10001B       
10001B       
;

data want(drop=_);
    do until (last.ID1);
        set have;
        by ID1;
        if not missing(ID2) then _=ID2;
    end;
    do until (last.ID1);
        set have;
        by ID1;
        if missing(ID2) then ID2=_;
        output;
    end;
run;

Result:

ID1      ID2
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 10-25-2019 02:13 AM

Yes. But what if you have multiple values in ID2 for ID1=10001A?

The DATA to DATA Step Macro
Blog: SASnrd

dapenDaniel · Posted 10-25-2019 02:19 AM

For ID1=10001A, there is only one ID2

PeterClemmensen · Posted 10-25-2019 02:20 AM

I am aware. For simple data like this, you can do something like below. I added another group to ID1 to demonstrate that it handles multiple groups

data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A       
10001A       
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A       
10001A       
10001A       
10001B       
10001B       
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B       
10001B       
10001B       
;

data want(drop=_);
    do until (last.ID1);
        set have;
        by ID1;
        if not missing(ID2) then _=ID2;
    end;
    do until (last.ID1);
        set have;
        by ID1;
        if missing(ID2) then ID2=_;
        output;
    end;
run;

Result:

ID1      ID2
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111

The DATA to DATA Step Macro
Blog: SASnrd

Patrick · Posted 10-25-2019 02:34 AM

Here an approach using a hash lookup.

data want;
  if _n_=1 then 
    do;
      dcl hash h1(dataset:'have(keep=id1 id2 where=(not missing(id2))))');
      h1.defineKey('id1');
      h1.defineData('id2');
      h1.defineDone();
    end;
  set have;
  if missing(id2) then h1.find();
run;

Jagadishkatam · Posted 10-25-2019 05:18 AM

alternately please try

data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A       
10001A       
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A       
10001A       
10001A       
10001B       
10001B       
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B       
10001B       
10001B       
;

proc sort data=have;
by id1 descending id2;
run;

data want;
length id2 $10.;
set have(rename=(id2=_id2));
by id1 ;
retain id2;
if first.id1 then id2='';
if _id2 ne '' then id2=_id2;
keep id1 id2;
run;

Thanks,
Jag

mkeintz · Posted 10-25-2019 07:15 AM

I would suggest taking advantage of how the MERGE …. BY statement combination works:

data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A       
10001A       
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A       
10001A       
10001A       
10001B       
10001B       
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B       
10001B       
10001B       
;

data want;
  merge have (drop=id2) 
        have (keep=id1 id2 where=(id2^='')); /*Edited change */
  by id1;
run;

The value for ID2 is derived only from the second invocation of HAVE. That invocation is shorter than the first, because it is limited to cases with non-missing ID2. When you merge two datasets (i.e. 2 uses of HAVE in this case) with BY groups, then the last value for variables that are uniquely in the shorter by-group will be propagated through all the "extra" observations in the longer by-group.

Note that this assumes that all the non-missing ID2 records for a given ID1 have only a single value. I have also edited the second invocation of HAVE to keep only variables ID1 and ID2. This will ensure the any other variables will get their values from the first HAVE.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

pradeepalankar · Posted 10-25-2019 07:31 AM

another try:

proc sql;
create table want as
select a.ID1 as ID1, COALESCE(a.ID2,b.ID2) as ID2
from
have a
left join
(select distinct ID1,ID2 from have where ID2 IS NOT MISSING) b
ON a.ID1=b.ID1;
quit;

novinosrin · Posted 10-25-2019 04:44 PM

Hi @dapenDaniel If there is only one distinct ID2 for an ID1 group that you clarified earlier, you are basically filling in the blanks right?

In that case , you could take advantage of the automatic remerge of PROC SQL


data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A       
10001A       
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A       
10001A       
10001A       
10001B       
10001B       
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B       
10001B       
10001B       
;
proc sql;
create table want(drop=_id2) as
select *,max(_ID2) as ID2
from have(rename=id2=_id2)
group by id1;
quit;