Solved: Replace missing values based on by variable

David_Billa · Posted 08-25-2020 10:45 AM

I've the data as below.Now I want to replace missing values from the variables (PE,CR,AA and CO) with the the value of the respective variable based on by variable(Code) . Any help?

Code	PE	CR	AA	CO
AS3	0.1285	2.152	.	.
AS3	0.1285	.	.	.
AS3	0.1285	.	.	.
AS3	0.1285	2.152	.	.
AS3	0.1285	.	0.0342	0.0331
AS3	0.1285	.	0.0342	0.0331
AS3	0.1285	2.152	.	.
AS4	0.1329	2.013	0.0326	0.0324
AS4	0.1329	.	0.0326	0.0324
AS4	0.1329	.	0.0326	0.0324

Desired result is,

Code	PE	CR	AA	CO
AS3	0.1285	2.152	0.0342	0.0331
AS3	0.1285	2.152	0.0342	0.0331
AS3	0.1285	2.152	0.0342	0.0331
AS3	0.1285	2.152	0.0342	0.0331
AS3	0.1285	2.152	0.0342	0.0331
AS3	0.1285	2.152	0.0342	0.0331
AS3	0.1285	2.152	0.0342	0.0331
AS4	0.1329	2.013	0.0326	0.0324
AS4	0.1329	2.013	0.0326	0.0324
AS4	0.1329	2.013	0.0326	0.0324

Tom · Posted 08-25-2020 11:30 AM

As @PaigeMiller said use PROC STDIZE.

data have;
  input Code $ PE  CR  AA  CO;
cards;
AS3  0.1285  2.152  .  .
AS3  0.1285  .  .  .
AS3  0.1285  .  .  .
AS3  0.1285  2.152  .  .
AS3  0.1285  .  0.0342  0.0331
AS3  0.1285  .  0.0342  0.0331
AS3  0.1285  2.152  .  .
AS4  0.1329  2.013  0.0326  0.0324
AS4  0.1329  .  0.0326  0.0324
AS4  0.1329  .  0.0326  0.0324
;

proc stdize data=have out=want REPONLY METHOD=MEAN;
  by code;
  var pe cr aa co;
run;

You could also try this trick using WHERE clauses on different cuts of the data.

data want2;
  merge have (drop=pe cr aa co)
        have (keep=code pe where=(not missing(pe)))
        have (keep=code cr where=(not missing(cr)))
        have (keep=code aa where=(not missing(aa)))
        have (keep=code co where=(not missing(co)))
  ;
  by code;
run;

View solution in original post

PaigeMiller · Posted 08-25-2020 10:52 AM

These variables always have the exact same value on every observation of the BY variable CODE? And they change only when CODE changes? you can use PROC STDIZE with the REPONLY option and METHOD=MEAN.

--
Paige Miller

David_Billa · Posted 08-25-2020 11:11 AM

There will not be more than one different value within a same group.

PaigeMiller · Posted 08-25-2020 11:12 AM

@David_Billa wrote:
There will not be more than one different value within a same group.

Then you can use PROC STDIZE as I described.

--
Paige Miller

novinosrin · Posted 08-25-2020 10:54 AM


data have;
input Code $	PE	CR	AA	CO;
cards;
AS3	0.1285	2.152	.	.
AS3	0.1285	.	.	.
AS3	0.1285	.	.	.
AS3	0.1285	2.152	.	.
AS3	0.1285	.	0.0342	0.0331
AS3	0.1285	.	0.0342	0.0331
AS3	0.1285	2.152	.	.
AS4	0.1329	2.013	0.0326	0.0324
AS4	0.1329	.	0.0326	0.0324
AS4	0.1329	.	0.0326	0.0324
;
proc means data=have nway noprint;
 class code pe;
 var cr--co;
 output out=temp(drop=_:) max=;
run;

data want;
 merge  have(keep=code pe) temp;
 by code pe;
run;

novinosrin · Posted 08-25-2020 11:00 AM

SQL-


data have;
input Code $	PE	CR	AA	CO;
cards;
AS3	0.1285	2.152	.	.
AS3	0.1285	.	.	.
AS3	0.1285	.	.	.
AS3	0.1285	2.152	.	.
AS3	0.1285	.	0.0342	0.0331
AS3	0.1285	.	0.0342	0.0331
AS3	0.1285	2.152	.	.
AS4	0.1329	2.013	0.0326	0.0324
AS4	0.1329	.	0.0326	0.0324
AS4	0.1329	.	0.0326	0.0324
;

proc sql;
 create table want as
 select b.*
 from (select code,pe from have) a
 left join
 (select code ,pe,max(cr) as cr, max(aa) as aa,max(co) as co
 from have
 group by code,pe) b
 on a.code=b.code and a.pe=b.pe;
quit;

David_Billa · Posted 08-25-2020 11:53 AM

Why 'on' condition only in PE variable?

I have apply the same logic in PE like how I need for RC, AA and CO?

novinosrin · Posted 08-25-2020 11:57 AM

Do you expect PE variable also to have missing values like RC, AA and CO?

Oh well, I assumed CODE PE values as unique combination forming a BY group.

Okay, Please try the below-

proc sql;
 create table want as
 select b.*
 from (select code from have) a
 left join
 (select code ,max(pe) as pe,max(cr) as cr, max(aa) as aa,max(co) as co
 from have
 group by code) b
 on a.code=b.code ;
quit;

Reeza · Posted 08-25-2020 12:52 PM

PROC STDIZE does them all at once in one step and is less code than any other approach.

Kurt_Bremser · Posted 08-25-2020 11:01 AM

What if you have more than one different value for a variable within a group? Should that result in a program ERROR, or should a specific value take precedence?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

David_Billa · Posted 08-25-2020 11:11 AM

There will not be more than one different value within a same group.

Example I posted resembles the real life data

Tom · Posted 08-25-2020 11:30 AM

As @PaigeMiller said use PROC STDIZE.

data have;
  input Code $ PE  CR  AA  CO;
cards;
AS3  0.1285  2.152  .  .
AS3  0.1285  .  .  .
AS3  0.1285  .  .  .
AS3  0.1285  2.152  .  .
AS3  0.1285  .  0.0342  0.0331
AS3  0.1285  .  0.0342  0.0331
AS3  0.1285  2.152  .  .
AS4  0.1329  2.013  0.0326  0.0324
AS4  0.1329  .  0.0326  0.0324
AS4  0.1329  .  0.0326  0.0324
;

proc stdize data=have out=want REPONLY METHOD=MEAN;
  by code;
  var pe cr aa co;
run;

You could also try this trick using WHERE clauses on different cuts of the data.

data want2;
  merge have (drop=pe cr aa co)
        have (keep=code pe where=(not missing(pe)))
        have (keep=code cr where=(not missing(cr)))
        have (keep=code aa where=(not missing(aa)))
        have (keep=code co where=(not missing(co)))
  ;
  by code;
run;

Replace missing values based on by variable

Re: Replace missing values based on by variable

Re: Replace missing values based on by variable

Re: Replace missing values based on by variable

Re: Replace missing values based on by variable

Re: Replace missing values based on by variable

Re: Replace missing values based on by variable

Re: Replace missing values based on by variable

Re: Replace missing values based on by variable

Re: Replace missing values based on by variable

Re: Replace missing values based on by variable

Re: Replace missing values based on by variable

Re: Replace missing values based on by variable

Registration is open

SAS Training: Just a Click Away