BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
I have a data where patients were prescribed medication. I have four columns labelled ID (for a patient), prescibeddate (date of prescription), drug ( name of the treatment) and quantity ( number of pills). In some patients I have the quantity missing. What I will like to do is to replace the missing quantity with the immediate previous quantity if the previous drug is the same in that patient. OR replace the missing quantity with the next if the next drug is the same. Otherwise the missing quantity is replaced by the standard quantity which is 28 for both treatment A and treatment B. I am working on a large data set. But as an example the original data is of the following format;

ID prescribeddate drug quantity
1 08/jan/2008 A 28
1 08/feb/2008 A 14
1 08/Mar/2008 A
1 08/Apr/2008 B 56
1 08/May/2008 B
2 08/Jan/2008 A 30
2 08/Mar/2009 B
2 28/Mar/2008 A 30
2 08/Mar/2008 B 14
3 08/Jan/2008 B
3 08/Mar/2009 B 56
3 08/Mar/2008 A
4 08/jan/2008 A
4 08/feb/2008 A
4 08/Mar/2008 A 14
4 08/Apr/2008 B
4 08/May/2008 B


I want the new data to look as follows:

ID prescribeddate drug quantity
1 08/jan/2008 A 28
1 08/feb/2008 A 14
1 08/Mar/2008 A 14
1 08/Apr/2008 B 56
1 08/May/2008 B 56
2 08/Jan/2008 A 30
2 08/Mar/2009 B 28
2 28/Mar/2008 A 30
2 08/Mar/2008 B 14
3 08/Jan/2008 B 56
3 08/Mar/2009 B 56
3 08/Mar/2008 A 28
4 08/jan/2008 A 14
4 08/feb/2008 A 14
4 08/Mar/2008 A 14
4 08/Apr/2008 B 28
4 08/May/2008 B 28

I believe i have made myself clear. Can anyone help. Thank you.
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Editor's note: this is a popular topic. We've edited the solution to be more helpful for those facing a similar challenge.  The first part here addresses the original question, and the second part addresses a more general LOCF (last observation carried forward) approach.

 

You apparently will go back one observation, but will go forward until a non-missing quantity is encountered, which was not how I understood your description.  In other words, if the preceding observation  is missing, then your task is a Next Observation Carried Back process.

 

I think the best way to approach this is to read an ID/DRUG group once and store quantity values into a array (_Q below).  Then reread the group and when a missing quantity is encountered, read back from the array:

 

data want;

  array _q {0:10} _temporary_;
  do N=1 by 1 until (last.drug); 
    set have;
    by id drug notsorted;
    _q{N}=quantity;
  end;

  do I=1 to N;
    set have;
    do J= I-1 to N while (quantity=.);
      quantity=_q{J};
    end;
    if quantity=. then quantity=28;
    output;
  end;
  drop I J N;
run;

 

  1. Notes that the lower bound  of the _Q array should be set to 0, so that "do J=I-1 to ..." works when I=1.
  2. Make sure the upper bound of array _Q will accommodate the longest series for single ID/DRUG group.

For the LOCF approach, @sarathannapareddy suggests the following.

Using 2 SET Statements in a single datastep:

Here is the code:

data l4;
 set l2;
 n=_n_;
 if missing(quantity) then
  do;
   do until (not missing(quantity));
     n=n-1;
     set l2(keep=quantity) point=n;  *second SET statement;
   end;
 end;
run;





 

For here's a comparison of the original data with the output for each of these approaches. The quantity_locf column is the result of @sarathannapareddy's LOCF approach.  The quantity_next_rules column shows the output of @mkeintz's array approach that takes into account the OP's business rules.

 

locf.png

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

11 REPLIES 11
sas_
Fluorite | Level 6
data l2;
input ID prescribeddate$ 3-14 drug$ quantity;
cards;
1 08/jan/2008 A 28
1 08/feb/2008 A 14
1 08/Mar/2008 A .
1 08/Apr/2008 B 56
1 08/May/2008 B .
2 08/Jan/2008 A 30
2 08/Mar/2009 B .
2 28/Mar/2008 A 30
2 08/Mar/2008 B 14
3 08/Jan/2008 B .
3 08/Mar/2009 B 56
3 08/Mar/2008 A .
4 08/jan/2008 A .
4 08/feb/2008 A .
4 08/Mar/2008 A 14
4 08/Apr/2008 B .
4 08/May/2008 B .
;
run;


data l3;
set l2;
retain tot;
if quantity>. then tot=quantity;
run;
omer2020
Obsidian | Level 7

I am amazed to see retain function can do that. I thought it is only used to keep variable (Columns) in first place in newly created data set. (when used after data step, before set). What I am trying to do is instead of retaining values after noon missing values, I want to replace missing values with the leading values;

what I have

Country ID Freq total;

CAD A 290 .

CAD B 48 .

CAD C 98 436

US A 3592 .

US B 157 .

US C 1018 4767

What I want;

tot

436

436

436

4767

4767

4767

 

can u suggest some thing for that please.

And thanks again for telling me something totally new

sravanece11
Calcite | Level 5

Hi,

 

It worked.. but could you please explain why are we using ">." here and what is the use of it?


@sas_ wrote:
data l2;
input ID prescribeddate$ 3-14 drug$ quantity;
cards;
1 08/jan/2008 A 28
1 08/feb/2008 A 14
1 08/Mar/2008 A .
1 08/Apr/2008 B 56
1 08/May/2008 B .
2 08/Jan/2008 A 30
2 08/Mar/2009 B .
2 28/Mar/2008 A 30
2 08/Mar/2008 B 14
3 08/Jan/2008 B .
3 08/Mar/2009 B 56
3 08/Mar/2008 A .
4 08/jan/2008 A .
4 08/feb/2008 A .
4 08/Mar/2008 A 14
4 08/Apr/2008 B .
4 08/May/2008 B .
;
run;


data l3;
set l2;
retain tot;
if quantity>. then tot=quantity;
run;

 

PaigeMiller
Diamond | Level 26

Any actual numeric value is considered to be greater than the dot which stands for missing.

--
Paige Miller
mkeintz
PROC Star

The missing value (.) is treated by SAS as a less than all non-missing numeric values, whether positive or negative.  In fact it is less than other missing values, such as   .A, .B, ... through .Z, but not less than ._.

 

Most of the time people don't bother with special missing values.  In such cases the expression

    X > .

is effectively the same as

  not(missing(x))

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sarathannapareddy
Fluorite | Level 6
Using 2 SET Statements in a single datastep:

Here is the code:

data l4;
set l2;
n=_n_;
if missing(quantity) then do;
do until (not missing(quantity));
n=n-1;
set l2(keep=quantity) point=n; *second SET statement;
end;
end;
run;

Sarath Annapareddy
http://studysas.blogspot.com/2008/10/lag-function-how-to-obtain-information.html
MelissaM
Obsidian | Level 7

@sarathannapareddy

This worked wonderfully, except I first had to sort in descending order (easy!).

Thank you!

Sandhya
Fluorite | Level 6
data MasterRec;
do until(last.drug);
set Curr_data;
by drug quantity;
if last.quantity and not missing(quantity) then do;
q = quantity;
end;
if missing(quantity) then do;
quantity=q;
end;
output;
end;
run;
mkeintz
PROC Star

Editor's note: this is a popular topic. We've edited the solution to be more helpful for those facing a similar challenge.  The first part here addresses the original question, and the second part addresses a more general LOCF (last observation carried forward) approach.

 

You apparently will go back one observation, but will go forward until a non-missing quantity is encountered, which was not how I understood your description.  In other words, if the preceding observation  is missing, then your task is a Next Observation Carried Back process.

 

I think the best way to approach this is to read an ID/DRUG group once and store quantity values into a array (_Q below).  Then reread the group and when a missing quantity is encountered, read back from the array:

 

data want;

  array _q {0:10} _temporary_;
  do N=1 by 1 until (last.drug); 
    set have;
    by id drug notsorted;
    _q{N}=quantity;
  end;

  do I=1 to N;
    set have;
    do J= I-1 to N while (quantity=.);
      quantity=_q{J};
    end;
    if quantity=. then quantity=28;
    output;
  end;
  drop I J N;
run;

 

  1. Notes that the lower bound  of the _Q array should be set to 0, so that "do J=I-1 to ..." works when I=1.
  2. Make sure the upper bound of array _Q will accommodate the longest series for single ID/DRUG group.

For the LOCF approach, @sarathannapareddy suggests the following.

Using 2 SET Statements in a single datastep:

Here is the code:

data l4;
 set l2;
 n=_n_;
 if missing(quantity) then
  do;
   do until (not missing(quantity));
     n=n-1;
     set l2(keep=quantity) point=n;  *second SET statement;
   end;
 end;
run;





 

For here's a comparison of the original data with the output for each of these approaches. The quantity_locf column is the result of @sarathannapareddy's LOCF approach.  The quantity_next_rules column shows the output of @mkeintz's array approach that takes into account the OP's business rules.

 

locf.png

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 34078 views
  • 6 likes
  • 11 in conversation