Editor's note: this is a popular topic. We've edited the solution to be more helpful for those facing a similar challenge. The first part here addresses the original question, and the second part addresses a more general LOCF (last observation carried forward) approach.
You apparently will go back one observation, but will go forward until a non-missing quantity is encountered, which was not how I understood your description. In other words, if the preceding observation is missing, then your task is a Next Observation Carried Back process.
I think the best way to approach this is to read an ID/DRUG group once and store quantity values into a array (_Q below). Then reread the group and when a missing quantity is encountered, read back from the array:
data want;
array _q {0:10} _temporary_;
do N=1 by 1 until (last.drug);
set have;
by id drug notsorted;
_q{N}=quantity;
end;
do I=1 to N;
set have;
do J= I-1 to N while (quantity=.);
quantity=_q{J};
end;
if quantity=. then quantity=28;
output;
end;
drop I J N;
run;
For the LOCF approach, @sarathannapareddy suggests the following.
Using 2 SET Statements in a single datastep:
Here is the code:data l4; set l2; n=_n_; if missing(quantity) then do; do until (not missing(quantity)); n=n-1; set l2(keep=quantity) point=n; *second SET statement; end; end; run;
For here's a comparison of the original data with the output for each of these approaches. The quantity_locf column is the result of @sarathannapareddy's LOCF approach. The quantity_next_rules column shows the output of @mkeintz's array approach that takes into account the OP's business rules.
I am amazed to see retain function can do that. I thought it is only used to keep variable (Columns) in first place in newly created data set. (when used after data step, before set). What I am trying to do is instead of retaining values after noon missing values, I want to replace missing values with the leading values;
what I have
Country ID Freq total;
CAD A 290 .
CAD B 48 .
CAD C 98 436
US A 3592 .
US B 157 .
US C 1018 4767
What I want;
tot
436
436
436
4767
4767
4767
can u suggest some thing for that please.
And thanks again for telling me something totally new
Hi,
It worked.. but could you please explain why are we using ">." here and what is the use of it?
@sas_ wrote:
data l2;
input ID prescribeddate$ 3-14 drug$ quantity;
cards;
1 08/jan/2008 A 28
1 08/feb/2008 A 14
1 08/Mar/2008 A .
1 08/Apr/2008 B 56
1 08/May/2008 B .
2 08/Jan/2008 A 30
2 08/Mar/2009 B .
2 28/Mar/2008 A 30
2 08/Mar/2008 B 14
3 08/Jan/2008 B .
3 08/Mar/2009 B 56
3 08/Mar/2008 A .
4 08/jan/2008 A .
4 08/feb/2008 A .
4 08/Mar/2008 A 14
4 08/Apr/2008 B .
4 08/May/2008 B .
;
run;
data l3;
set l2;
retain tot;
if quantity>. then tot=quantity;
run;
Any actual numeric value is considered to be greater than the dot which stands for missing.
The missing value (.) is treated by SAS as a less than all non-missing numeric values, whether positive or negative. In fact it is less than other missing values, such as .A, .B, ... through .Z, but not less than ._.
Most of the time people don't bother with special missing values. In such cases the expression
X > .
is effectively the same as
not(missing(x))
@sarathannapareddy
This worked wonderfully, except I first had to sort in descending order (easy!).
Thank you!
Editor's note: this is a popular topic. We've edited the solution to be more helpful for those facing a similar challenge. The first part here addresses the original question, and the second part addresses a more general LOCF (last observation carried forward) approach.
You apparently will go back one observation, but will go forward until a non-missing quantity is encountered, which was not how I understood your description. In other words, if the preceding observation is missing, then your task is a Next Observation Carried Back process.
I think the best way to approach this is to read an ID/DRUG group once and store quantity values into a array (_Q below). Then reread the group and when a missing quantity is encountered, read back from the array:
data want;
array _q {0:10} _temporary_;
do N=1 by 1 until (last.drug);
set have;
by id drug notsorted;
_q{N}=quantity;
end;
do I=1 to N;
set have;
do J= I-1 to N while (quantity=.);
quantity=_q{J};
end;
if quantity=. then quantity=28;
output;
end;
drop I J N;
run;
For the LOCF approach, @sarathannapareddy suggests the following.
Using 2 SET Statements in a single datastep:
Here is the code:data l4; set l2; n=_n_; if missing(quantity) then do; do until (not missing(quantity)); n=n-1; set l2(keep=quantity) point=n; *second SET statement; end; end; run;
For here's a comparison of the original data with the output for each of these approaches. The quantity_locf column is the result of @sarathannapareddy's LOCF approach. The quantity_next_rules column shows the output of @mkeintz's array approach that takes into account the OP's business rules.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.