BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JNWong
Calcite | Level 5

Hi,

 

   i have a dataset and some vasiables have a lot of missing values. i just want to replace the missing value by the previous value.

like;

data a;
input x y z;
datalines;
1 2 .
3 . 5
4 2 6
4 . 3
5 . 7
5 1 .
5 . 0
6 . 8
;

i need to manuplate the dataset group by variable x.

 

i have tried some ways,like t he code below. and it workes ,now i am wondering if exists another methods,such as using  sql and macro to solve the problem. Thank you!

DATA a2;
   SET a;
   retain _y;
   by x;
   if x = lag(x) then do;
   if not missing(y) then _y=y;
   else y=_y;
   end;
   drop _y;
RUN;
1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Your current code misses some records that should have retained the previous value. The following captures those and, while you didn't say you needed it, does the same for z:

 

DATA a2;
   SET a;
   retain _y _z;
   by x;
   if not missing(y) or first.x then _y=y;
   else y=_y;
   if not missing(z) or first.x then _z=z;
   else z=_z;
   drop _:;
RUN;

HTH,

Art, CEO, AnalystFinder.com

View solution in original post

3 REPLIES 3
art297
Opal | Level 21

Your current code misses some records that should have retained the previous value. The following captures those and, while you didn't say you needed it, does the same for z:

 

DATA a2;
   SET a;
   retain _y _z;
   by x;
   if not missing(y) or first.x then _y=y;
   else y=_y;
   if not missing(z) or first.x then _z=z;
   else z=_z;
   drop _:;
RUN;

HTH,

Art, CEO, AnalystFinder.com

JNWong
Calcite | Level 5

thanks! art.  i did ignore the variable z. your code will help my future cases.

as i am sorry for another question.

 

for the dataset above. i have tried sql self-join to replace the missing value by the previous value and the next-non value within a group.

proc sql;
create table a1 as
 select lookback.*, next.y as Next_y         
  from  ( select temp.*, prev.y as Prev_y         
           from a left join as prev
             on a.x = prev.x and prev.y < a.y
           group by a.countryname,
        ) as lookback 
   left join a as next
     on lookback.x = next.x and next.y > lookback.y
   group by lookback.x;
   quit;

i have tried the self-join and view. a for the original dataset and a1 is a new dataset.

however, i did not get my results. for some syntax error. and i am wondering how to insert a condition clause for missing value in this code.

maybe i did not illustrate my problem.anyway,Thanks!!

 

JNWong
Calcite | Level 5

hi,

As I want to make a macro program to replace multiple variable'missing values by their previous non-missing values within a group and before that I need to sort the dataset using multiple-by variables in order to identify every observations.

 

I have a test dataset:

data temp2;
input countryname $1-5 +1 countrycode $7-8 +1 dev 1. +1 legal 1. +1 audit 1.;
datalines;
china 22 9 2 3
china 22 . . 3
china 22 7 3 .
china 21 4 . .
japan 13 3 . 1
japan 12 3 . .
japan 13 1 2 3
 ;
run;

And as you can see, I need to sort by countryname and countrycode. And replace the dev legal audit missing value. I have tried a macro like:

%macro replace(dsetin=,dsetout=,idvars=,vars=);
 %if &dsetout = %then %let dsetout = &dsetin;
 %let char=_;
 %do xi=1 %to %sysfunc(countw(&vars));
 %let var = %scan(&vars,&xi);
 %let  _VARs_=%sysfunc(cat(&char,&var));
 %end;
 %let m=%sysfunc(countw(&vars));
 %do yi=1 %to %sysfunc(countw(&idvars));
 %let idvar = %scan(&idvars,&yi);
 %let lagid = lag(&idvar);
 %end;
 %let n=%sysfunc(countw(&idvars));
 proc sort data=&dsetin out=temp1;
 by &idvars;
 run;
 data &dsetout;
 set temp1;
 by &idvars;
 array id(&n)&idvar;
 array lag(&n)&lagid;
 array vara(&m)&var;
 array vars(&m)&_VARs_;
 do yi = 1 to &n;
 if lag(yi)=id(yi) then do;
 do xi= 1 to &m;
 retain &_VARs_;
 if not missing(&var) then &_VARs_ =&var;
else &var = &_VARs_; 
end;
end;
end;
drop of &_VARs_:;
run;
%mend replace;
%replace(dsetin=temp,dsetout=temp2,idvars=countryname countrycode,vars=DEV LEGAL AUDIT)`

 

It seems tedious. I can not handle the task. Since the number of by-variables using sort may be different from the number of 'replace' variables. I hope the macro can be applied to different occasions. Sometimes the number of idvars using sort can be more than the number of 'replace' variables. Sometimes the former can be less than the latter one.

Hope it can illustrate my problems. Thanks.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1117 views
  • 0 likes
  • 2 in conversation