DATA Step, Macro, Functions and more

replace the missing value by previus value with sql and macro

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

replace the missing value by previus value with sql and macro

Hi,

 

   i have a dataset and some vasiables have a lot of missing values. i just want to replace the missing value by the previous value.

like;

data a;
input x y z;
datalines;
1 2 .
3 . 5
4 2 6
4 . 3
5 . 7
5 1 .
5 . 0
6 . 8
;

i need to manuplate the dataset group by variable x.

 

i have tried some ways,like t he code below. and it workes ,now i am wondering if exists another methods,such as using  sql and macro to solve the problem. Thank you!

DATA a2;
   SET a;
   retain _y;
   by x;
   if x = lag(x) then do;
   if not missing(y) then _y=y;
   else y=_y;
   end;
   drop _y;
RUN;

Accepted Solutions
Solution
‎02-17-2017 04:07 AM
PROC Star
Posts: 7,364

Re: replace the missing value by previus value with sql and macro

Your current code misses some records that should have retained the previous value. The following captures those and, while you didn't say you needed it, does the same for z:

 

DATA a2;
   SET a;
   retain _y _z;
   by x;
   if not missing(y) or first.x then _y=y;
   else y=_y;
   if not missing(z) or first.x then _z=z;
   else z=_z;
   drop _:;
RUN;

HTH,

Art, CEO, AnalystFinder.com

View solution in original post


All Replies
Solution
‎02-17-2017 04:07 AM
PROC Star
Posts: 7,364

Re: replace the missing value by previus value with sql and macro

Your current code misses some records that should have retained the previous value. The following captures those and, while you didn't say you needed it, does the same for z:

 

DATA a2;
   SET a;
   retain _y _z;
   by x;
   if not missing(y) or first.x then _y=y;
   else y=_y;
   if not missing(z) or first.x then _z=z;
   else z=_z;
   drop _:;
RUN;

HTH,

Art, CEO, AnalystFinder.com

Contributor
Posts: 44

Re: replace the missing value by previus value with sql and macro

thanks! art.  i did ignore the variable z. your code will help my future cases.

as i am sorry for another question.

 

for the dataset above. i have tried sql self-join to replace the missing value by the previous value and the next-non value within a group.

proc sql;
create table a1 as
 select lookback.*, next.y as Next_y         
  from  ( select temp.*, prev.y as Prev_y         
           from a left join as prev
             on a.x = prev.x and prev.y < a.y
           group by a.countryname,
        ) as lookback 
   left join a as next
     on lookback.x = next.x and next.y > lookback.y
   group by lookback.x;
   quit;

i have tried the self-join and view. a for the original dataset and a1 is a new dataset.

however, i did not get my results. for some syntax error. and i am wondering how to insert a condition clause for missing value in this code.

maybe i did not illustrate my problem.anyway,Thanks!!

 

Contributor
Posts: 44

replace missing value with a macro

hi,

As I want to make a macro program to replace multiple variable'missing values by their previous non-missing values within a group and before that I need to sort the dataset using multiple-by variables in order to identify every observations.

 

I have a test dataset:

data temp2;
input countryname $1-5 +1 countrycode $7-8 +1 dev 1. +1 legal 1. +1 audit 1.;
datalines;
china 22 9 2 3
china 22 . . 3
china 22 7 3 .
china 21 4 . .
japan 13 3 . 1
japan 12 3 . .
japan 13 1 2 3
 ;
run;

And as you can see, I need to sort by countryname and countrycode. And replace the dev legal audit missing value. I have tried a macro like:

%macro replace(dsetin=,dsetout=,idvars=,vars=);
 %if &dsetout = %then %let dsetout = &dsetin;
 %let char=_;
 %do xi=1 %to %sysfunc(countw(&vars));
 %let var = %scan(&vars,&xi);
 %let  _VARs_=%sysfunc(cat(&char,&var));
 %end;
 %let m=%sysfunc(countw(&vars));
 %do yi=1 %to %sysfunc(countw(&idvars));
 %let idvar = %scan(&idvars,&yi);
 %let lagid = lag(&idvar);
 %end;
 %let n=%sysfunc(countw(&idvars));
 proc sort data=&dsetin out=temp1;
 by &idvars;
 run;
 data &dsetout;
 set temp1;
 by &idvars;
 array id(&n)&idvar;
 array lag(&n)&lagid;
 array vara(&m)&var;
 array vars(&m)&_VARs_;
 do yi = 1 to &n;
 if lag(yi)=id(yi) then do;
 do xi= 1 to &m;
 retain &_VARs_;
 if not missing(&var) then &_VARs_ =&var;
else &var = &_VARs_; 
end;
end;
end;
drop of &_VARs_:;
run;
%mend replace;
%replace(dsetin=temp,dsetout=temp2,idvars=countryname countrycode,vars=DEV LEGAL AUDIT)`

 

It seems tedious. I can not handle the task. Since the number of by-variables using sort may be different from the number of 'replace' variables. I hope the macro can be applied to different occasions. Sometimes the number of idvars using sort can be more than the number of 'replace' variables. Sometimes the former can be less than the latter one.

Hope it can illustrate my problems. Thanks.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 146 views
  • 0 likes
  • 2 in conversation