BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

I have a repeated measurement with missing data and trying to use the observation available carried backward ad forward to substitute the missing before and after the observation. 

What would the solution here?

Thanks in advance! 

data have;
input pat_id	hemoglobin;
cards;
1	.
1	123
1	.
1	124
1	125
1	121
1	.
1	126
2	120
2	220
2	.
2	.
2	.
2	201
2	.
2	.
3   .
3   .
3   .
3   .
3   150
; 

data want;
input pat_id hemoglobin;
cards;
1	123
1	123
1	123
1	124
1	125
1	121
1	121
1	126
2	120
2	220
2	220
2	220
2	220
2	201
2	201
2	201
3   150
3   150
3   150
3   150
3   150
run; 
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Carry over once, sort in reverse order, carry over a second time, re-sort to original order:

data have;
input pat_id hemoglobin;
cards;
1 .
1 123
1 .
1 124
1 125
1 121
1 .
1 126
2 120
2 220
2 .
2 .
2 .
2 201
2 .
2 .
3 .
3 .
3 .
3 .
3 150
;
run;

data int1;
set have;
by pat_id;
retain _hem;
if first.pat_id then _hem = hemoglobin;
if hemoglobin = .
then hemoglobin = _hem;
else _hem = hemoglobin;
order = _n_;
drop _hem;
run;

proc sort data=int1;
by pat_id descending order;
run;

data int2;
set int1;
by pat_id;
retain _hem;
if first.pat_id then _hem = hemoglobin;
if hemoglobin = .
then hemoglobin = _hem;
else _hem = hemoglobin;
drop _hem;
run;

proc sort
  data=int2
  out=want (drop=order)
;
by order;
run;

View solution in original post

13 REPLIES 13
Kurt_Bremser
Super User

Carry over once, sort in reverse order, carry over a second time, re-sort to original order:

data have;
input pat_id hemoglobin;
cards;
1 .
1 123
1 .
1 124
1 125
1 121
1 .
1 126
2 120
2 220
2 .
2 .
2 .
2 201
2 .
2 .
3 .
3 .
3 .
3 .
3 150
;
run;

data int1;
set have;
by pat_id;
retain _hem;
if first.pat_id then _hem = hemoglobin;
if hemoglobin = .
then hemoglobin = _hem;
else _hem = hemoglobin;
order = _n_;
drop _hem;
run;

proc sort data=int1;
by pat_id descending order;
run;

data int2;
set int1;
by pat_id;
retain _hem;
if first.pat_id then _hem = hemoglobin;
if hemoglobin = .
then hemoglobin = _hem;
else _hem = hemoglobin;
drop _hem;
run;

proc sort
  data=int2
  out=want (drop=order)
;
by order;
run;
RW9
Diamond | Level 26 RW9
Diamond | Level 26

What does this data represent?  If just shows some value and an id.  Its highly likely that you real data has more information than this, such as timepoints/datetimes or visits etc. which would make identifying where values should be imputed a lot more robust and simpler.  From what you post here, just moving the value before and after doesn't really seem the ideal method, I mean:

1	121
1	.
1	126

Should 121 be carried forward, or 126 be imputed backwards? 

Cruise
Ammonite | Level 13
Yeah, that's such a conceptual argument. I have to think of the underlying biological plausibility. But the extent of analysis I'm doing right now is not taking full advantage of the data yet. But definitely an important aspect to think about when I actually start looking at it for modelling purposes.
data_null__
Jade | Level 19

EDIT:

The method below will not work correctly if ID has all missing LOCF variables.  This will work.

 

data have;
   input pat_id hem @@;
   ohem = hem; *for check;
   cards;
1	.   1	123 1	. 1	124 1	125 1	121 1	. 1	126 
2	120 2	220 2	. 2	. 2	. 2	201 2	. 2	. 
3   . 3   . 3   . 3   . 3   150 
4 . 4 .
;;;;
   run; 
proc print;
   run;

data want;
   flag = 0;
   do until(last.pat_id); 
      set have(in=in1);
      by pat_id;
      if flag then continue;
      if not missing(hem) then do; flag=1; back=hem; end;
      end;
   do until(last.pat_id);
      update have(obs=0 drop=ohem) have(drop=ohem);
      by pat_id;
      hem = coalesce(hem,back);
      set have(keep=ohem);
      output;
      end;
   run;
proc print;
   run;

 

 

 

 

 

The uses the update trick for the LOCF and accommodates the "look back".  Ideally you would like a method that works to LOCF/BACK many variables with having to reference them in the code too often.

 

data have;
   input pat_id hem @@;
   ohem = hem; *for check;
   cards;
1	.   1	123 1	. 1	124 1	125 1	121 1	. 1	126 
2	120 2	220 2	. 2	. 2	. 2	201 2	. 2	. 
3   . 3   . 3   . 3   . 3   150 
;;;;
   run; 
proc print;
   run;
data backv / view=backv;
   set have(where=(not missing(hem)));
   by pat_id;
   if first.pat_id;
   rename hem=back;
   keep pat_id hem;
   run;
data want;
   if 0 then set have;
   do until(last.pat_id); 
      set backv;
      by pat_id;
      end;
   do until(last.pat_id);
      update have(obs=0 drop=ohem) have(drop=ohem);
      by pat_id;
      hem = coalesce(hem,back);
      set have(keep=ohem);
      output;
      end;
   drop back;
   run;
proc print;
   run;

 

Capture.PNG

Cruise
Ammonite | Level 13
Excellent. Was it more time efficient that forward-then-reverse order approach than Bremser's? Just curious to understand the cons-pros.
data_null__
Jade | Level 19

@Cruise wrote:
Excellent. Was it more time efficient that forward-then-reverse order approach than Bremser's? Just curious to understand the cons-pros.

If your data are sorted BY id as in your sample data then no sorting is needed.

 

One advantage of the update trick is you can LOCF many variables.  In fact the default would be to LOCF all variables other than ID.

 

I'm trying to figure a easy way to create a single observation of the FIRST non-missing values for each ID that is simple. Then the look back part could be easily extended to all non-id variables.

novinosrin
Tourmaline | Level 20

When Guru JKing has participated, mine is unimportant but allow me to have fun

 

data have;
input pat_id	hemoglobin;
cards;
1	.
1	123
1	.
1	124
1	125
1	121
1	.
1	126
2	120
2	220
2	.
2	.
2	.
2	201
2	.
2	.
3   .
3   .
3   .
3   .
3   150
; 

data want;
if _n_=1 then do;
dcl hash h(multidata:'y');
h.definekey('pat_id');
h.definedata('hemoglobin');
h.definedone();
end;
do  until(last.pat_id);
set have;
by pat_id;
if hemoglobin ne . then if h.check() ne 0 then rc=h.add();
end;
call missing(_h);
do until(last.pat_id);
set have;
by pat_id;
if not missing(hemoglobin) then _h=hemoglobin;
else if missing(hemoglobin) and _h then hemoglobin=_h;
else if missing(hemoglobin) then rc=h.find();
output;
end;
drop rc _h;
run;

 

 

novinosrin
Tourmaline | Level 20

Or a direct dump into hash, however i was bit concerned about the insertion order earlier and played safe with Double DOW 1. to insert 2. look up

 

data have;
input pat_id	hemoglobin;
cards;
1	.
1	123
1	.
1	124
1	125
1	121
1	.
1	126
2	120
2	220
2	.
2	.
2	.
2	201
2	.
2	.
3   .
3   .
3   .
3   .
3   150
; 
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash h(dataset:'have(where=(hemoglobin ne .))');
h.definekey('pat_id');
h.definedata('hemoglobin');
h.definedone();
end;
call missing(_h);
do until(last.pat_id);
set have;
by pat_id;
if not missing(hemoglobin) then _h=hemoglobin;
else if missing(hemoglobin) and _h then hemoglobin=_h;
else if missing(hemoglobin) then rc=h.find();
output;
end;
drop rc _h;
run;

 

novinosrin
Tourmaline | Level 20

Or with update and hash

 


data have;
input pat_id	hemoglobin;
cards;
1	.
1	123
1	.
1	124
1	125
1	121
1	.
1	126
2	120
2	220
2	.
2	.
2	.
2	201
2	.
2	.
3   .
3   .
3   .
3   .
3   150
; 

data want;
if _n_=1 then do;
if 0 then set have;
dcl hash h(dataset:'have(where=(hemoglobin ne .))');
h.definekey('pat_id');
h.definedata('hemoglobin');
h.definedone();
end;
do until(last.pat_id);
update have(obs=0) have;
by pat_id;
if missing(hemoglobin) then rc=h.find();
output;
end;
drop rc ;
run;
Cruise
Ammonite | Level 13

@novinosrin

Thanks. This is nice. 

By the way, which doens't belong to this thread though is that

I use below code of yours like a built-in function in sas in every program I use. So handy. However, I could not modify your code to deduplicate the repeated values for each individual. So I use proc nodukpey on the resulting dataset. 

 

proc sql;
create table have1(drop=r) as /*N=6,588,823 and unique kids N=2,862,721*/
select *,missing(cat_var) as flag_missing,range(calculated flag_missing) as r
from have /*N=9,876,767*/
group by child_id
having ((r=0)*visit)=min(visit) or (cat_var ne ' ')
order by child_id,visit;
quit;

 

proc sort data=have1 nodupkey out=have2;

by child_id cat_var;

run;

novinosrin
Tourmaline | Level 20

Thank you @Cruise for your acknowledgement. Answering your questions has always made me learn more and this thread being touch special as It's always pleasure and privilege to read/be part of thread when @data_null__ has participated. I shamelessly plagiarized a lot of DN's code from SAS L and continue to learn just by reading the master's posts. I am sure DN is so used to having many followers like me. My dream is to become as good as the geniuses  DN, PD, PG stats. (unrealistic eh lol)

 

With reference to your other post, I am glad you were able to modify on your own however should you need help, never hesitate. Besides our professional and personal life, we are all a SAS family here where we can freely ask, find and share. 🙂 Take care

 

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 4236 views
  • 15 likes
  • 5 in conversation