DATA Step, Macro, Functions and more

missing values

Accepted Solution Solved
Reply
Contributor
Posts: 55
Accepted Solution

missing values

[ Edited ]

hi everybody,

I know that every time you find me the right solution, so I do not delay to contact you with my best regards.

I have a dataset as:

 

ID     P  NC MG MP

M1    1  1    .       .

M1    1  2    2.3   .

M1    1  3    2.6   3.2

M1    1  4    3.5   4.1

M1    2  1    .       1.2

M1    2  2    .       2.3

M1    2  3    4.3   3.5

M2    1  1    .       .

M2    1  2    .       .

M2    1  3    3.5   3.2

M2    1  4    4.1   3.3

M2    1  5    4.0   3.3

M2    2  1    .       .

M2    2  2    3.6   4.1

M2    2  3    3.4   4.0

M2    2  4    3.4   3.9

M2    3  1    4.2   4.2

M2    3  2    4.3   4.2

M2    3  3    4.2   4.1

M3    2  1    4.6   .

M3    2  2    4.8   .

M3    2  3    4.7   .

M3    2  4    4.3   .

ID represents an identifier,
P = an exercise number, can be 1, 2, 3 or 4.
NC: an order number of each test for the same ID and the same exercise number.
MG and MP are observations.
What I want is to replace the missing values of MG or MP with the next value if it exists or even with the next value after if the next value is also missing. Nothing changes if all the observations during the same exercise are missing, either for MG or MP.

Note that missing values If they exist are only observed in the two first observations for each ID at each P.
The desired result is as follows:

 

ID     P  NC MG MP

M1    1  1    2.3   3.2

M1    1  2    2.3   3.2

M1    1  3    2.6   3.2

M1    1  4    3.5   4.1

M1    2  1    4.3   1.2

M1    2  2    4.3   2.3

M1    2  3    4.3   3.5

M2    1  1    3.5   3.2

M2    1  2    3.5   3.2

M2    1  3    3.5   3.2

M2    1  4    4.1   3.3

M2    1  5    4.0   3.3

M2    2  1    3.6   4.1

M2    2  2    3.6   4.1

M2    2  3    3.4   4.0

M2    2  4    3.4   3.9

M2    3  1    4.2   4.2

M2    3  2    4.3   4.2

M2    3  3    4.2   4.1

M3    2  1    4.6   .

M3    2  2    4.8   .

M3    2  3    4.7   .

M3    2  4    4.3   .

I count on your help, thank you.


Accepted Solutions
Solution
‎04-17-2017 05:37 PM
Super User
Posts: 5,069

Re: missing values

Easiest would probably be to sort the data set in reverse order, and bring forward the most recent nonmissing value.  For example:

 

proc sort data=have;

by id p descending nc;

run;

 

data want;

set have;

by id p;

retain new_mg new_mp;

if first.p then do;

   new_mg=mg;

   new_mp=mp;

end;

if mg > . then new_mg = mg;

else mg = new_mg;

if mp > . then new_mp = mp;

else mp = new_mp;

drop new_mg new_mp;

run;

 

You can always put the observations back into their original order:

 

proc sort data=want;

by id p nc;

run;

View solution in original post


All Replies
Solution
‎04-17-2017 05:37 PM
Super User
Posts: 5,069

Re: missing values

Easiest would probably be to sort the data set in reverse order, and bring forward the most recent nonmissing value.  For example:

 

proc sort data=have;

by id p descending nc;

run;

 

data want;

set have;

by id p;

retain new_mg new_mp;

if first.p then do;

   new_mg=mg;

   new_mp=mp;

end;

if mg > . then new_mg = mg;

else mg = new_mg;

if mp > . then new_mp = mp;

else mp = new_mp;

drop new_mg new_mp;

run;

 

You can always put the observations back into their original order:

 

proc sort data=want;

by id p nc;

run;

Contributor
Posts: 55

Re: missing values

Thank you very much, it works very well.

Super User
Posts: 9,662

Re: missing values

data have;
input ID  $   P  NC MG MP;
cards;
M1    1  1    .       .
M1    1  2    2.3   .
M1    1  3    2.6   3.2
M1    1  4    3.5   4.1
M1    2  1    .       1.2
M1    2  2    .       2.3
M1    2  3    4.3   3.5
M2    1  1    .       .
M2    1  2    .       .
M2    1  3    3.5   3.2
M2    1  4    4.1   3.3
M2    1  5    4.0   3.3
M2    2  1    .       .
M2    2  2    3.6   4.1
M2    2  3    3.4   4.0
M2    2  4    3.4   3.9
M2    3  1    4.2   4.2
M2    3  2    4.3   4.2
M2    3  3    4.2   4.1
M3    2  1    4.6   .
M3    2  2    4.8   .
M3    2  3    4.7   .
M3    2  4    4.3   .
;
run;

proc sort data=have;
 by ID   P  NC;
run;
data temp;
 set have;
 by ID     P ;
 retain _MG _MP;
 if first.P then call missing(_MG,_MP);
 if not missing(MG) then _MG=MG;
 if not missing(MP) then _MP=MP;
 drop MG MP;
run;

proc sort data=temp;
 by ID   P  descending NC;
run;
data want;
 set temp;
 by ID     P ;
 retain MG MP;
 if first.P then call missing(MG,MP);
 if not missing(_MG) then MG=_MG;
 if not missing(_MP) then MP=_MP;
 drop _:;
run;

proc sort data=want;
 by ID   P  NC;
run;


Contributor
Posts: 55

Re: missing values

It works well too, thanks to you and to Astounding for your help.

PROC Star
Posts: 161

Re: missing values

Oops I am late to the part,Just for fun and nothing more: 

 

data _null_;

if _n_=1 then do;

if 0 then set have;

declare hash myhash(multidata:'YES',ordered:'a');

myhash.definekey('id','p');

myhash.definedata('id','p','nc','mg','mp');

myhash.definedone( );

end;

set have end=last;

by id p;

if ((missing(mp) and missing(mg)) or (missing(mp) or missing(mg))) and myhash.check() ne 0 then rc=myhash.add();

else if myhash.check() ne 0 and not missing(mp) and not missing(mg) then rc=myhash.add();

else if myhash.check()=0 and (missing(mp) and missing(mg)) then rc=myhash.add();

else if myhash.check()=0 and ((not missing(mp) and missing(mg)) or (missing(mp) and not missing(mg))) then

do;

_mg=mg;

_mp=mp;

_nc=nc;

rc=myhash.find();

if missing(mg) then mg=_mg;

if missing(mp) then mp=_mp;

myhash.replacedup();

rc=myhash.find_next();

do while(rc=0);

if missing(mg) then mg=_mg;

if missing(mp) then mp=_mp;

myhash.replacedup();

rc=myhash.find_next();

end;

mg=_mg;

mp=_mp;

nc=_nc;

myhash.add();

end;

else if myhash.check()=0 and not missing(mp) and not missing(mg) then do;

_mg=mg;

_mp=mp;

_nc=nc;

rc=myhash.find();

if missing(mg) then mg=_mg;

if missing(mp) then mp=_mp;

myhash.replacedup();

rc=myhash.find_next();

do while(rc=0);

if missing(mg) then mg=_mg;

if missing(mp) then mp=_mp;

myhash.replacedup();

rc=myhash.find_next();

end;

mg=_mg;

mp=_mp;

nc=_nc;

myhash.add();

end;

if last then rc=myhash.output(dataset:'want');

run;

 

proc sort data=want;

by   id p nc;

run;

 

Regards,

Naveen Srinivasan

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 238 views
  • 4 likes
  • 4 in conversation