I have data of the following basic forma (two columns). Well call this D1 for the data set.
Index Target
A .
A .
A .
A 4
A 3
B .
B 4
B 3.5
B .
I want to replace the missing values in "Target" by the first non-missing value, and fill every missing value after the first non missing target with the previous non-missing. Here's the "end" result:
Index Target
A 4*
A 4*
A 4*
A 4
A 3
B 4*
B 4
B 3.5
B 3.5*
* is for the imputed values.
First, I started by created the following data step:
/*This step creates macro variables for every index of the first non-missing value*/
data _null_;
set D1;
retain j ind;
by index;
if _n_ = 1 then j = 0;
if first.index then ind = 0;
if Target ^= '.' and ind = 0 then do;
call symput('FirstTarget'||strip(left(j)), 'Target');
ind = 1;
end;
run;
/*Now I want to apply those macro variables to those values...and where my code doesn't work*/
data D1new;
set D1;
by index;
retain ind j LastTarget;
/*If it's not the first missing value then use the last value, which has already been imputed or already existed*/
if Target = '.' and ind = 1 then Target = LastTarget;
/* If it's the first missing value then apply the stored first non-missing value */
if Target = '.' and ind = 0 then do;
ind = 1;
Target = &&FirstTarget&j;
end;
/*For next iteration, create lagging variable that will remember the previous target value to use for imputation if the next value is missing. This takes care of the 'forward' missing values.*/
LastTarget = Target;
drop j ind;
run;
The problem lies with j. I've tried using call symput('j', j); or %let j = j; but it doesn't recognize j as a -number- but instead sees j as literally 'j' and tries to interpret &&FirstTarget&j as &FirstTargetj which doesn't exist (and won't work).
I have a larger number of indexes than two, and I figure understanding this will help me develop my SAS skills to using more than just the bare skills I have.
At worst case, I could split the data out by each Index and do it with a macro and $do i in 1 %to &n; but that seems unnecessary/overly long/non-elegant.
What am I doing wrong?
Ok a simple one-->
data int1;
set have;
by index;
retain _t;
if first.index then _t = target;
if target = .
then target = _t;
else _t = target;
order = _n_;
drop _t;
run;
proc sort data=int1;
by index descending order;
run;
data int2;
set int1;
by index;
retain _t;
if first.index then _t = target;
if target = .
then target = _t;
else _t = target;
drop _t;
run;
proc sort
data=int2
out=want (drop=order)
;
by order;
run;
data have;
input Index $ Target;
cards;
A .
A .
A .
A 4
A 3
B .
B 4
B 3.5
B .
;
data want;
if _n_=1 then do;
dcl hash h(dataset:'have(where=(Target ne .))');
h.definekey('index');
h.definedata('Target');
h.definedone();
end;
do until(last.index);
update have(obs=0) have;
by index;
if missing(Target) then rc=h.find();
output;
end;
drop rc ;
run;
Thanks for the solution. I don't understand hash though. I suppose that's up to me to go figure out how your code works.
Ok a simple one-->
data int1;
set have;
by index;
retain _t;
if first.index then _t = target;
if target = .
then target = _t;
else _t = target;
order = _n_;
drop _t;
run;
proc sort data=int1;
by index descending order;
run;
data int2;
set int1;
by index;
retain _t;
if first.index then _t = target;
if target = .
then target = _t;
else _t = target;
drop _t;
run;
proc sort
data=int2
out=want (drop=order)
;
by order;
run;
Thanks so much for your help. i haven't rant this through yet but I'm assuming it works.
Your quick responses were very helpful.
Please don't ever assume. Test thoroughly and if it doesn't work, come back to us. Simple as that!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.