Help using Base SAS procedures

Replacing missing values by previous observation

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 16
Accepted Solution

Replacing missing values by previous observation

I have a data where patients were prescribed medication. I have four columns labelled ID (for a patient), prescibeddate (date of prescription), drug ( name of the treatment) and quantity ( number of pills). In some patients I have the quantity missing. What I will like to do is to replace the missing quantity with the immediate previous quantity if the previous drug is the same in that patient. OR replace the missing quantity with the next if the next drug is the same. Otherwise the missing quantity is replaced by the standard quantity which is 28 for both treatment A and treatment B. I am working on a large data set. But as an example the original data is of the following format;

ID prescribeddate drug quantity
1 08/jan/2008 A 28
1 08/feb/2008 A 14
1 08/Mar/2008 A
1 08/Apr/2008 B 56
1 08/May/2008 B
2 08/Jan/2008 A 30
2 08/Mar/2009 B
2 28/Mar/2008 A 30
2 08/Mar/2008 B 14
3 08/Jan/2008 B
3 08/Mar/2009 B 56
3 08/Mar/2008 A
4 08/jan/2008 A
4 08/feb/2008 A
4 08/Mar/2008 A 14
4 08/Apr/2008 B
4 08/May/2008 B


I want the new data to look as follows:

ID prescribeddate drug quantity
1 08/jan/2008 A 28
1 08/feb/2008 A 14
1 08/Mar/2008 A 14
1 08/Apr/2008 B 56
1 08/May/2008 B 56
2 08/Jan/2008 A 30
2 08/Mar/2009 B 28
2 28/Mar/2008 A 30
2 08/Mar/2008 B 14
3 08/Jan/2008 B 56
3 08/Mar/2009 B 56
3 08/Mar/2008 A 28
4 08/jan/2008 A 14
4 08/feb/2008 A 14
4 08/Mar/2008 A 14
4 08/Apr/2008 B 28
4 08/May/2008 B 28

I believe i have made myself clear. Can anyone help. Thank you.

Accepted Solutions
Highlighted
Solution
a week ago
Valued Guide
Posts: 947

Re: Replacing missing values by previous observation

[ Edited ]

Editor's note: this is a popular topic. We've edited the solution to be more helpful for those facing a similar challenge.  The first part here addresses the original question, and the second part addresses a more general LOCF (last observation carried forward) approach.

 

You apparently will go back one observation, but will go forward until a non-missing quantity is encountered, which was not how I understood your description.  In other words, if the preceding observation  is missing, then your task is a Next Observation Carried Back process.

 

I think the best way to approach this is to read an ID/DRUG group once and store quantity values into a array (_Q below).  Then reread the group and when a missing quantity is encountered, read back from the array:

 

data want;

  array _q {0:10} _temporary_;
  do N=1 by 1 until (last.drug); 
    set have;
    by id drug notsorted;
    _q{N}=quantity;
  end;

  do I=1 to N;
    set have;
    do J= I-1 to N while (quantity=.);
      quantity=_q{J};
    end;
    if quantity=. then quantity=28;
    output;
  end;
  drop I J N;
run;

 

  1. Notes that the lower bound  of the _Q array should be set to 0, so that "do J=I-1 to ..." works when I=1.
  2. Make sure the upper bound of array _Q will accommodate the longest series for single ID/DRUG group.

For the LOCF approach, @sarathannapareddy suggests the following.

Using 2 SET Statements in a single datastep:

Here is the code:

data l4;
 set l2;
 n=_n_;
 if missing(quantity) then
  do;
   do until (not missing(quantity));
     n=n-1;
     set l2(keep=quantity) point=n;  *second SET statement;
   end;
 end;
run;





 

For here's a comparison of the original data with the output for each of these approaches. The quantity_locf column is the result of @sarathannapareddy's LOCF approach.  The quantity_next_rules column shows the output of @mkeintz's array approach that takes into account the OP's business rules.

 

locf.png

View solution in original post


All Replies
Occasional Contributor
Posts: 11

Re: Replacing missing values by previous observation

you can use retain function.
Regular Contributor
Posts: 184

Re: Replacing missing values by previous observation

Contributor
Posts: 66

Re: Replacing missing values by previous observation

data l2;
input ID prescribeddate$ 3-14 drug$ quantity;
cards;
1 08/jan/2008 A 28
1 08/feb/2008 A 14
1 08/Mar/2008 A .
1 08/Apr/2008 B 56
1 08/May/2008 B .
2 08/Jan/2008 A 30
2 08/Mar/2009 B .
2 28/Mar/2008 A 30
2 08/Mar/2008 B 14
3 08/Jan/2008 B .
3 08/Mar/2009 B 56
3 08/Mar/2008 A .
4 08/jan/2008 A .
4 08/feb/2008 A .
4 08/Mar/2008 A 14
4 08/Apr/2008 B .
4 08/May/2008 B .
;
run;


data l3;
set l2;
retain tot;
if quantity>. then tot=quantity;
run;
Occasional Contributor
Posts: 19

Re: Replacing missing values by previous observation

I am amazed to see retain function can do that. I thought it is only used to keep variable (Columns) in first place in newly created data set. (when used after data step, before set). What I am trying to do is instead of retaining values after noon missing values, I want to replace missing values with the leading values;

what I have

Country ID Freq total;

CAD A 290 .

CAD B 48 .

CAD C 98 436

US A 3592 .

US B 157 .

US C 1018 4767

What I want;

tot

436

436

436

4767

4767

4767

 

can u suggest some thing for that please.

And thanks again for telling me something totally new

Occasional Contributor
Posts: 5

Re: Replacing missing values by previous observation

Using 2 SET Statements in a single datastep:

Here is the code:

data l4;
set l2;
n=_n_;
if missing(quantity) then do;
do until (not missing(quantity));
n=n-1;
set l2(keep=quantity) point=n; *second SET statement;
end;
end;
run;

Sarath Annapareddy
http://studysas.blogspot.com/2008/10/lag-function-how-to-obtain-information.html
Contributor
Posts: 57

Re: Replacing missing values by previous observation

data MasterRec;
do until(last.drug);
set Curr_data;
by drug quantity;
if last.quantity and not missing(quantity) then do;
q = quantity;
end;
if missing(quantity) then do;
quantity=q;
end;
output;
end;
run;
Highlighted
Solution
a week ago
Valued Guide
Posts: 947

Re: Replacing missing values by previous observation

[ Edited ]

Editor's note: this is a popular topic. We've edited the solution to be more helpful for those facing a similar challenge.  The first part here addresses the original question, and the second part addresses a more general LOCF (last observation carried forward) approach.

 

You apparently will go back one observation, but will go forward until a non-missing quantity is encountered, which was not how I understood your description.  In other words, if the preceding observation  is missing, then your task is a Next Observation Carried Back process.

 

I think the best way to approach this is to read an ID/DRUG group once and store quantity values into a array (_Q below).  Then reread the group and when a missing quantity is encountered, read back from the array:

 

data want;

  array _q {0:10} _temporary_;
  do N=1 by 1 until (last.drug); 
    set have;
    by id drug notsorted;
    _q{N}=quantity;
  end;

  do I=1 to N;
    set have;
    do J= I-1 to N while (quantity=.);
      quantity=_q{J};
    end;
    if quantity=. then quantity=28;
    output;
  end;
  drop I J N;
run;

 

  1. Notes that the lower bound  of the _Q array should be set to 0, so that "do J=I-1 to ..." works when I=1.
  2. Make sure the upper bound of array _Q will accommodate the longest series for single ID/DRUG group.

For the LOCF approach, @sarathannapareddy suggests the following.

Using 2 SET Statements in a single datastep:

Here is the code:

data l4;
 set l2;
 n=_n_;
 if missing(quantity) then
  do;
   do until (not missing(quantity));
     n=n-1;
     set l2(keep=quantity) point=n;  *second SET statement;
   end;
 end;
run;





 

For here's a comparison of the original data with the output for each of these approaches. The quantity_locf column is the result of @sarathannapareddy's LOCF approach.  The quantity_next_rules column shows the output of @mkeintz's array approach that takes into account the OP's business rules.

 

locf.png

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 16059 views
  • 4 likes
  • 8 in conversation