How to replace missing value with a nonmissing value within a group

CynthiaWei · Posted 10-31-2020 03:53 AM

Hi,

I have a dataset that each ID has multiple records. Within each group (ID), the variable A and B have only two values, missing and some other value.

data have;

input ID A B;

cards;

1 10 .

1 . 16

1 10 .

2 5 .

2 . 7

What I want to do is to replace the missing value with the non-missing value of the same variable within each group:

1 10 16

2 5 7

I really appreciate any help with this question.

Best regards,

C

Kurt_Bremser · Posted 10-31-2020 04:10 AM

Use a double DO loop:

data want;
do until (last.id);
  set have;
  by id;
  _a = max(_a,a);
  _b = max(_b,b);
end;
do until (last.id);
  set have;
  by id;
  a = _a;
  b = _b;
  output;
end;
drop _:;
run;

Untested, posted from my tablet.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 10-31-2020 04:22 AM

PS

PLEASE (and that's a very big PLEASE) use the "little running man" button to post code. Really. I mean it.

I positively hate to have to clean out all the iffy things (empty lines etc) that the main posting window does to code.

BTW, I managed to run the code (after cleaning up your data step, see above) on SAS on Demand, so it's tested now.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PaigeMiller · Posted 10-31-2020 06:38 AM

If it really is the case that there are non-missing values which are always the same within an ID, or missing values (never two different non-missing values within an ID), then in my opinion, the easiest solution is to use PROC STDIZE with the REPONLY option.

proc stdize data=have reponly method=mean out=want;
by id;
var a b;
run;

--
Paige Miller

PeterClemmensen · Posted 10-31-2020 09:01 AM

data have;
input ID A B;
cards;
1 10 .
1 . 16
1 . 16
1 10 .
2 5 .
2 5 .
2 5 .
2 . 7
;

data want(drop=aa bb);
   do _N_ = 1 by 1 until (last.ID);
      set have;
      by ID;
      aa = aa <> a;
      bb = bb <> b;
   end;
   do _N_ = 1 to _N_;
      set have;
      if a = . then a = aa;
      if b = . then b = bb;
      output;
   end;
run;

Result:

The DATA to DATA Step Macro
Blog: SASnrd

novinosrin · Posted 10-31-2020 10:51 AM

Hi @CynthiaWei While it's a privilege to learn from the real "Proc Star- @PaigeMiller " who would make SAS look innocent with his mastery of Procs, with yet another demonstration of a Proc that does with utmost elegance, here is another alternative of Datastep that utilizes the UPDATE construct to fill the missing values.

data have;
input ID A B;
cards;
1 10 .
1 . 16
1 . 16
1 10 .
2 5 .
2 5 .
2 5 .
2 . 7
;

data want;
 do _n_=1 by 1 until(last.id);
  update have(obs=0) have;
  by id;
 end;
 do _n_=1 to _n_;
  output;
 end;
run;

The above technique might be a bit of learning curve when you intend to do further processing i.e. should the need is to go beyond after fill in a situation that would involve look up/matrix programming(hashes/multi dim arrays) among others for further processing.

Alternatively, the MAX operator/function is indeed the traditional way as it works for both SQL/Datastep. For one, the syntax being convenient and many users are quite familiar with SQL as is often known as "readymeals" solution,

proc sql;
 create table want as
 select a.id,A,B
 from have(keep=id) a left join
 (select id, max(a) as A, max(b) as B from have group by id) b
 on a.id=b.id;
quit;

If you are familiar with automatic remerge functionality of Proc SQL, you might be tempted to attempt that auto fill, albeit the SQL processor wouldn't see the inclusion of a non summary list variable to force or trigger that. Therefore, you would need an explicit self JOIN.

Fun stuff! Have a great day and Kind regards 🙂

How to replace missing value with a nonmissing value within a group

Re: How to replace missing value with a nonmissing value within a group

Re: How to replace missing value with a nonmissing value within a group

Re: How to replace missing value with a nonmissing value within a group

Re: How to replace missing value with a nonmissing value within a group

Re: How to replace missing value with a nonmissing value within a group

Registration is open

How to replace missing value with a nonmissing value within a group

Re: How to replace missing value with a nonmissing value within a group

Re: How to replace missing value with a nonmissing value within a group

Re: How to replace missing value with a nonmissing value within a group

Re: How to replace missing value with a nonmissing value within a group

Re: How to replace missing value with a nonmissing value within a group

Registration is open

SAS Training: Just a Click Away