I would like to replace missing value with closest value from the same DMRN, but the dx_date should be less than 365 days.
This is how my data look like:
DMRN dx_date BMI
30 04FEB2013 35.82
30 04JUN2013 .
30 04NOV2014 26.55
25 22JAN2016 30.42
25 25MAR2022 .
This is how I want it to be;
DMRN dx_date BMI
30 04FEB2013 35.82
30 04JUN2013 35.82
30 04NOV2013 26.55
25 22JAN2016 30.42
25 25MAR2022 .
What codes should I use?
If you don't have a lot of data with a lot of missing values then below should suffice.
data have;
input DMRN dx_date:date9. BMI;
format dx_date date9.;
datalines;
30 04FEB2013 .
30 04MAR2013 99.10
30 03JUN2013 1
30 04JUN2013 .
30 05JUN2013 2
30 06JUN2013 .
30 07JUN2013 .
30 08JUN2013 3
30 04NOV2014 26.55
25 22JAN2016 30.42
25 25MAR2022 .
;
proc sql;
select
l.*,
case
when missing(l.bmi) then (select r.bmi
from
(
select r.dx_date,r.bmi
from have r
where r.dmrn=l.dmrn and not missing(r.bmi)
having abs(r.dx_date-l.dx_date)=min(abs(r.dx_date-l.dx_date))
and abs(r.dx_date-l.dx_date)<365
) r
having min(r.dx_date)=r.dx_date
)
else l.bmi
end as bmi_2
from have l
order by l.dmrn, l.dx_date
;
quit;
If you need time series data in the end then consider using Proc Expand instead (requires SAS/ETS licensed).
If you don't have a lot of data with a lot of missing values then below should suffice.
data have;
input DMRN dx_date:date9. BMI;
format dx_date date9.;
datalines;
30 04FEB2013 .
30 04MAR2013 99.10
30 03JUN2013 1
30 04JUN2013 .
30 05JUN2013 2
30 06JUN2013 .
30 07JUN2013 .
30 08JUN2013 3
30 04NOV2014 26.55
25 22JAN2016 30.42
25 25MAR2022 .
;
proc sql;
select
l.*,
case
when missing(l.bmi) then (select r.bmi
from
(
select r.dx_date,r.bmi
from have r
where r.dmrn=l.dmrn and not missing(r.bmi)
having abs(r.dx_date-l.dx_date)=min(abs(r.dx_date-l.dx_date))
and abs(r.dx_date-l.dx_date)<365
) r
having min(r.dx_date)=r.dx_date
)
else l.bmi
end as bmi_2
from have l
order by l.dmrn, l.dx_date
;
quit;
If you need time series data in the end then consider using Proc Expand instead (requires SAS/ETS licensed).
Do you want to only carry forward from previous events, or also backward from future events, if these are nearer?
@Tom pointed out that my original code did not accomodate situations where a DMRN group begins with a missing BMI value. So here is a data step approach that requires presorted the data by DMRN,DX_DATE.
data have;
input DMRN dx_date:date9. BMI;
format dx_date date9.;
datalines;
30 04FEB2013 .
30 04MAR2013 99.10
30 03JUN2013 1
30 04JUN2013 .
30 05JUN2013 2
30 06JUN2013 .
30 07JUN2013 .
30 08JUN2013 3
30 04NOV2014 26.55
25 22JAN2016 30.42
25 25MAR2022 .
;
proc sort data=have;
by dmrn dx_date;
run;
data non_Missing_windows /view=non_Missing_windows;
set have (rename=(dx_date=nm_dte bmi=nm_bmi) where=(nm_bmi^=.)); /*Read only non-missing BMI's*/
by dmrn;
dx_date=nm_dte - ifn(first.dmrn,365,min(365,0.5*dif(nm_dte)));
format dx_date date9.;
run;
data want (drop=nm_:);
set have (in=inhave)
non_missing_windows (in=nmw keep=dmrn dx_date) ;
by dmrn dx_date;
if nmw then set non_missing_windows (keep=nm_dte nm_bmi);
else if first.dmrn then call missing(of nm_:);
if inhave;
new_bmi=ifn(bmi=. and dx_date-365<=nm_dte,nm_bmi,bmi);
run;
The code below is the original response, but does not properly process DMRN's that start with a missing BMI:
data have;
input DMRN dx_date :date9. BMI;
format dx_date date9.;
datalines;
30 04FEB2013 35.82
30 04JUN2013 .
30 04NOV2014 26.55
25 22JAN2016 30.42
25 25MAR2022 .
run;
data want (drop=prv_: nxt_:);
set have;
by dmrn notsorted;
if bmi^=. then do; /*If non-missing BMI, update PRV_ and NXT_ */
new_bmi=bmi;
merge have (firstobs=1 rename=(dmrn=prv_dmrn dx_date=prv_dte bmi=prv_bmi) where=(prv_bmi^=.))
have (firstobs=2 rename=(dmrn=nxt_dmrn dx_date=nxt_dte bmi=nxt_bmi) where=(nxt_bmi^=.)) ;
end;
else do;
/* Eliminate Non-eligible or ancient history*/
if prv_dmrn^=dmrn or prv_dte<dx_date-365 then call missing(of prv_:);
/* Copy PREV, if next is too far or for another DMRN */
/* Copy NEXT if it is near enough and PREV is missing */
if nxt_dmrn ^= dmrn or nxt_dte> dx_date+365 then new_bmi=prv_bmi;
else if prv_dte=. and nxt_dmrn=dmrn and nxt_dte <= dx_date+365 then new_bmi=nxt_bmi;
else do; /*But if Both prev and next qualify, compare distances */
prv_nxt_mean_dte=mean(prv_dte,nxt_dte);
if prv_nxt_mean_dte > dx_date then new_bmi=nxt_bmi;
else new_bmi=prv_bmi;
end;
end;
run;
I've created a variable NEW_BMI so you can see the before and after date step values of BMI.
The primary tool here is the conditional use of the MERGE statement.
Then, when the current BMI is missing, one can examine the PRV (previous) and NXT non-missing values to get BMI from the closest date. BTW, if the previous and next dates are equally distant, the code above takes the previous BMI.
This program assumes that each DMRN is sorted by DX_DATE.
I don't think you have thought through all of the possible combinations of missing BMI values.
For example if the first observation has a missing value of BMI then the NXT_.... variables are not populated since the MERGE never happens.
Not sure if there are others since I did not test every scenario.
@Tom wrote:
I don't think you have thought through all of the possible combinations of missing BMI values.
For example if the first observation has a missing value of BMI then the NXT_.... variables are not populated since the MERGE never happens.
Not sure if there are others since I did not test every scenario.
You're right. Revision needed.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.