Hi, I have a data set like below.
ID1 ID2
10001A
10001A
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A
10001A
10001A
Now, I would like to fill the missing value. As long as the ID1 is the same, fill the missing value with the available value. The data set I want is below
ID1 ID2
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
Can anyone tell me what code I need to use? Thanks!
I am aware. For simple data like this, you can do something like below. I added another group to ID1 to demonstrate that it handles multiple groups
data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A
10001A
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A
10001A
10001A
10001B
10001B
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B
10001B
10001B
;
data want(drop=_);
do until (last.ID1);
set have;
by ID1;
if not missing(ID2) then _=ID2;
end;
do until (last.ID1);
set have;
by ID1;
if missing(ID2) then ID2=_;
output;
end;
run;
Result:
ID1 ID2 10001A AAA111 10001A AAA111 10001A AAA111 10001A AAA111 10001A AAA111 10001A AAA111 10001A AAA111 10001A AAA111 10001A AAA111 10001B AAA111 10001B AAA111 10001B AAA111 10001B AAA111 10001B AAA111 10001B AAA111 10001B AAA111 10001B AAA111 10001B AAA111
Yes. But what if you have multiple values in ID2 for ID1=10001A?
For ID1=10001A, there is only one ID2
I am aware. For simple data like this, you can do something like below. I added another group to ID1 to demonstrate that it handles multiple groups
data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A
10001A
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A
10001A
10001A
10001B
10001B
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B
10001B
10001B
;
data want(drop=_);
do until (last.ID1);
set have;
by ID1;
if not missing(ID2) then _=ID2;
end;
do until (last.ID1);
set have;
by ID1;
if missing(ID2) then ID2=_;
output;
end;
run;
Result:
ID1 ID2 10001A AAA111 10001A AAA111 10001A AAA111 10001A AAA111 10001A AAA111 10001A AAA111 10001A AAA111 10001A AAA111 10001A AAA111 10001B AAA111 10001B AAA111 10001B AAA111 10001B AAA111 10001B AAA111 10001B AAA111 10001B AAA111 10001B AAA111 10001B AAA111
Here an approach using a hash lookup.
data want;
if _n_=1 then
do;
dcl hash h1(dataset:'have(keep=id1 id2 where=(not missing(id2))))');
h1.defineKey('id1');
h1.defineData('id2');
h1.defineDone();
end;
set have;
if missing(id2) then h1.find();
run;
alternately please try
data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A
10001A
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A
10001A
10001A
10001B
10001B
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B
10001B
10001B
;
proc sort data=have;
by id1 descending id2;
run;
data want;
length id2 $10.;
set have(rename=(id2=_id2));
by id1 ;
retain id2;
if first.id1 then id2='';
if _id2 ne '' then id2=_id2;
keep id1 id2;
run;
I would suggest taking advantage of how the MERGE …. BY statement combination works:
data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A
10001A
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A
10001A
10001A
10001B
10001B
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B
10001B
10001B
;
data want;
merge have (drop=id2)
have (keep=id1 id2 where=(id2^='')); /*Edited change */
by id1;
run;
The value for ID2 is derived only from the second invocation of HAVE. That invocation is shorter than the first, because it is limited to cases with non-missing ID2. When you merge two datasets (i.e. 2 uses of HAVE in this case) with BY groups, then the last value for variables that are uniquely in the shorter by-group will be propagated through all the "extra" observations in the longer by-group.
Note that this assumes that all the non-missing ID2 records for a given ID1 have only a single value. I have also edited the second invocation of HAVE to keep only variables ID1 and ID2. This will ensure the any other variables will get their values from the first HAVE.
another try:
proc sql;
create table want as
select a.ID1 as ID1, COALESCE(a.ID2,b.ID2) as ID2
from
have a
left join
(select distinct ID1,ID2 from have where ID2 IS NOT MISSING) b
ON a.ID1=b.ID1;
quit;
Hi @dapenDaniel If there is only one distinct ID2 for an ID1 group that you clarified earlier, you are basically filling in the blanks right?
In that case , you could take advantage of the automatic remerge of PROC SQL
data have;
input ID1 $ ID2 $;
infile datalines missover;
datalines;
10001A
10001A
10001A AAA111
10001A AAA111
10001A AAA111
10001A AAA111
10001A
10001A
10001A
10001B
10001B
10001B AAA111
10001B AAA111
10001B AAA111
10001B AAA111
10001B
10001B
10001B
;
proc sql;
create table want(drop=_id2) as
select *,max(_ID2) as ID2
from have(rename=id2=_id2)
group by id1;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.