BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dapenDaniel
Obsidian | Level 7

Hi, I have a data set like below.

 

ID1                    ID2

10001A             

10001A            

10001A            AAA111

10001A            AAA111

10001A            AAA111

10001A            AAA111

10001A

10001A

10001A

 

Now, I would like to fill the missing value. As long as the ID1 is the same, fill the missing value with the available value. The data set I want is below

 

 

ID1                    ID2

10001A            AAA111

10001A            AAA111

10001A            AAA111

10001A            AAA111

10001A            AAA111

10001A            AAA111

10001A            AAA111

10001A            AAA111

10001A            AAA111

 

Can anyone tell me what code I need to use? Thanks!

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

I am aware. For simple data like this, you can do something like below. I added another group to ID1 to demonstrate that it handles multiple groups

 

data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A       
10001A       
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A       
10001A       
10001A       
10001B       
10001B       
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B       
10001B       
10001B       
;

data want(drop=_);
    do until (last.ID1);
        set have;
        by ID1;
        if not missing(ID2) then _=ID2;
    end;
    do until (last.ID1);
        set have;
        by ID1;
        if missing(ID2) then ID2=_;
        output;
    end;
run;

Result:

 

ID1      ID2
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111

View solution in original post

8 REPLIES 8
PeterClemmensen
Tourmaline | Level 20

Yes. But what if you have multiple values in ID2 for ID1=10001A?

dapenDaniel
Obsidian | Level 7

For ID1=10001A, there is only one ID2

PeterClemmensen
Tourmaline | Level 20

I am aware. For simple data like this, you can do something like below. I added another group to ID1 to demonstrate that it handles multiple groups

 

data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A       
10001A       
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A       
10001A       
10001A       
10001B       
10001B       
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B       
10001B       
10001B       
;

data want(drop=_);
    do until (last.ID1);
        set have;
        by ID1;
        if not missing(ID2) then _=ID2;
    end;
    do until (last.ID1);
        set have;
        by ID1;
        if missing(ID2) then ID2=_;
        output;
    end;
run;

Result:

 

ID1      ID2
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001A   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
10001B   AAA111
Patrick
Opal | Level 21

Here an approach using a hash lookup.

data want;
  if _n_=1 then 
    do;
      dcl hash h1(dataset:'have(keep=id1 id2 where=(not missing(id2))))');
      h1.defineKey('id1');
      h1.defineData('id2');
      h1.defineDone();
    end;
  set have;
  if missing(id2) then h1.find();
run;

 

Jagadishkatam
Amethyst | Level 16

alternately please try

 

data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A       
10001A       
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A       
10001A       
10001A       
10001B       
10001B       
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B       
10001B       
10001B       
;

proc sort data=have;
by id1 descending id2;
run;

data want;
length id2 $10.;
set have(rename=(id2=_id2));
by id1 ;
retain id2;
if first.id1 then id2='';
if _id2 ne '' then id2=_id2;
keep id1 id2;
run;
Thanks,
Jag
mkeintz
PROC Star

I would suggest taking advantage of how the MERGE …. BY statement combination works:

 

data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A       
10001A       
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A       
10001A       
10001A       
10001B       
10001B       
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B       
10001B       
10001B       
;

data want;
  merge have (drop=id2) 
        have (keep=id1 id2 where=(id2^='')); /*Edited change */
  by id1;
run;

 

The value for ID2 is derived only from the second invocation of HAVE.  That invocation is shorter than the first, because it is limited to cases with non-missing ID2.  When you merge two datasets (i.e. 2 uses of HAVE in this case) with BY groups, then the last value for variables that are uniquely in the shorter by-group will be propagated through all the "extra" observations in the longer by-group.

 

Note that this assumes that all the non-missing ID2 records for a given ID1 have only a single value.  I have also edited the second invocation of HAVE to keep only variables ID1 and ID2.  This will ensure the any other variables will get their values from the first HAVE.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
pradeepalankar
Obsidian | Level 7

another try: 

 

proc sql;
create table want as
select a.ID1 as ID1, COALESCE(a.ID2,b.ID2) as ID2
from
have a
left join
(select distinct ID1,ID2 from have where ID2 IS NOT MISSING) b
ON a.ID1=b.ID1;
quit;

novinosrin
Tourmaline | Level 20

Hi @dapenDaniel   If there is only one distinct ID2 for an ID1 group that you clarified earlier, you are basically filling in the blanks right?

 

In that case , you could take advantage of the automatic remerge of PROC SQL


data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A       
10001A       
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A       
10001A       
10001A       
10001B       
10001B       
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B       
10001B       
10001B       
;
proc sql;
create table want(drop=_id2) as
select *,max(_ID2) as ID2
from have(rename=id2=_id2)
group by id1;
quit;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 885 views
  • 0 likes
  • 7 in conversation