Editor's note: this is a popular topic. We've edited the solution to be more helpful for those facing a similar challenge. The first part here addresses the original question, and the second part addresses a more general LOCF (last observation carried forward) approach.
You apparently will go back one observation, but will go forward until a non-missing quantity is encountered, which was not how I understood your description. In other words, if the preceding observation is missing, then your task is a Next Observation Carried Back process.
I think the best way to approach this is to read an ID/DRUG group once and store quantity values into a array (_Q below). Then reread the group and when a missing quantity is encountered, read back from the array:
data want;
array _q {0:10} _temporary_;
do N=1 by 1 until (last.drug);
set have;
by id drug notsorted;
_q{N}=quantity;
end;
do I=1 to N;
set have;
do J= I-1 to N while (quantity=.);
quantity=_q{J};
end;
if quantity=. then quantity=28;
output;
end;
drop I J N;
run;
For the LOCF approach, @sarathannapareddy suggests the following.
Using 2 SET Statements in a single datastep:
Here is the code:data l4; set l2; n=_n_; if missing(quantity) then do; do until (not missing(quantity)); n=n-1; set l2(keep=quantity) point=n; *second SET statement; end; end; run;
For here's a comparison of the original data with the output for each of these approaches. The quantity_locf column is the result of @sarathannapareddy's LOCF approach. The quantity_next_rules column shows the output of @mkeintz's array approach that takes into account the OP's business rules.
I am amazed to see retain function can do that. I thought it is only used to keep variable (Columns) in first place in newly created data set. (when used after data step, before set). What I am trying to do is instead of retaining values after noon missing values, I want to replace missing values with the leading values;
what I have
Country ID Freq total;
CAD A 290 .
CAD B 48 .
CAD C 98 436
US A 3592 .
US B 157 .
US C 1018 4767
What I want;
tot
436
436
436
4767
4767
4767
can u suggest some thing for that please.
And thanks again for telling me something totally new
Hi,
It worked.. but could you please explain why are we using ">." here and what is the use of it?
@sas_ wrote:
data l2;
input ID prescribeddate$ 3-14 drug$ quantity;
cards;
1 08/jan/2008 A 28
1 08/feb/2008 A 14
1 08/Mar/2008 A .
1 08/Apr/2008 B 56
1 08/May/2008 B .
2 08/Jan/2008 A 30
2 08/Mar/2009 B .
2 28/Mar/2008 A 30
2 08/Mar/2008 B 14
3 08/Jan/2008 B .
3 08/Mar/2009 B 56
3 08/Mar/2008 A .
4 08/jan/2008 A .
4 08/feb/2008 A .
4 08/Mar/2008 A 14
4 08/Apr/2008 B .
4 08/May/2008 B .
;
run;
data l3;
set l2;
retain tot;
if quantity>. then tot=quantity;
run;
Any actual numeric value is considered to be greater than the dot which stands for missing.
The missing value (.) is treated by SAS as a less than all non-missing numeric values, whether positive or negative. In fact it is less than other missing values, such as .A, .B, ... through .Z, but not less than ._.
Most of the time people don't bother with special missing values. In such cases the expression
X > .
is effectively the same as
not(missing(x))
@sarathannapareddy
This worked wonderfully, except I first had to sort in descending order (easy!).
Thank you!
Editor's note: this is a popular topic. We've edited the solution to be more helpful for those facing a similar challenge. The first part here addresses the original question, and the second part addresses a more general LOCF (last observation carried forward) approach.
You apparently will go back one observation, but will go forward until a non-missing quantity is encountered, which was not how I understood your description. In other words, if the preceding observation is missing, then your task is a Next Observation Carried Back process.
I think the best way to approach this is to read an ID/DRUG group once and store quantity values into a array (_Q below). Then reread the group and when a missing quantity is encountered, read back from the array:
data want;
array _q {0:10} _temporary_;
do N=1 by 1 until (last.drug);
set have;
by id drug notsorted;
_q{N}=quantity;
end;
do I=1 to N;
set have;
do J= I-1 to N while (quantity=.);
quantity=_q{J};
end;
if quantity=. then quantity=28;
output;
end;
drop I J N;
run;
For the LOCF approach, @sarathannapareddy suggests the following.
Using 2 SET Statements in a single datastep:
Here is the code:data l4; set l2; n=_n_; if missing(quantity) then do; do until (not missing(quantity)); n=n-1; set l2(keep=quantity) point=n; *second SET statement; end; end; run;
For here's a comparison of the original data with the output for each of these approaches. The quantity_locf column is the result of @sarathannapareddy's LOCF approach. The quantity_next_rules column shows the output of @mkeintz's array approach that takes into account the OP's business rules.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.