Hi
I have a test data with missing values and I want to fill the table by taking previous month's value.
Test data
month | A | B | C |
Jan | 1 | 2 | 3.3 |
Feb | 4 | . | 56 |
Mar | 3 | . | . |
Apr | 5 | 3 | 4 |
May | . | 6 | 2 |
data I want to have
month | A | B | C |
Jan | 1 | 2 | 3.3 |
Feb | 4 | 2 | 56 |
Mar | 3 | 2 | 56 |
Apr | 5 | 3 | 4 |
May | 5 | 6 | 2 |
Here's the code I wrote, but it gives me the exact original table with missing data and I do not get an error message in the log.
I wonder why the lagged values are not applied according to my if statement logic..
I appreciate if someone could help me shed a light on this.
data test;
input month $ A B C;
datalines;
Jan 1 2 3.3
Feb 4 . 56
Mar 3 . .
Apr 5 3 4
May . 6 2
;
run;
data no_missing(drop=i);
set test;
array ticker(*)A--C;
do i=1 to dim(ticker);
if ticker{i}=. then ticker{i}=lag(ticker{i});
else ticker{i}=ticker{i};
end;
run;
LAG it not the proper tool for LOCF. I like the UPDATE trick for your data.
I just recalled lag function in a conditional statement always returns missing value...
That's why I was getting the identical original table after running code..
but I appreciate if anyone can let me know how to code this without using lag function..
LAG it not the proper tool for LOCF. I like the UPDATE trick for your data.
Hi data_null,
Thank you so much for your quick response!
I came up with my revised code, but this only works if I know how many consecutive rows will be missing in advance.
Your code is much better!! Thanks again!
data no_missing(drop=i);
set test;
array ticker(*)A--C;
do i=1 to dim(ticker);
lag=lag(ticker{i});
lag=lag2(ticker{i});
if ticker{i}=. then ticker{i}=coalesce(lag,lag2);
else if ticker{i}=ticker{i};
end;
run;
Like the VIEW! Learn some!
I used to hate adding another dummy variable in this case, but not anymore with the view. Thanks for sharing, John.
Quote: "I just recalled lag function in a conditional statement always returns missing value.."
Wrong. The LAG() will return next value in the queue, missing or non-missing, conditional or non-conditional. Please read the manual.
Just like you, I like the options from data _null_ very much. Here is another alternative that maybe closer to what you are familiar with:
data test;
input month $ A B C;
datalines;
Jan 1 2 3.3
Feb 4 . 56
Mar 3 . .
Apr 5 3 4
May . 6 2
;
run;
data want;
set test;
array ticker(3) a--c;
array _ticker(3) _temporary_;
do i=1 to 3;
_ticker(i)=coalesce(ticker(i), _ticker(i));
ticker(i)=_ticker(i);
end;
drop i;
run;
Hi Hai.Kuo,
Thanks for your response and pointing this out!
In below doc, I read "Since LAG is called in conditional code, the value it returned is not the value from the previous observation, but is the value of its previous execution."
I also read "However, when it is executed conditionally, the LAG function only retrieves values from observations for which the condition is satisfied."
If you have any other good article on this topic, please let me know.
http://support.sas.com/resources/papers/proceedings09/055-2009.pdf
I think both of those statement are saying essentially the same thing. I think it is best to leave the word observation(s) out of the conversation all together.
The LAG function retrieves the value from the last time it was executed. Or in the case of LAGn the nth time it was executed.
I know it is confusing, and trust me, I have been there and stuck there for a not-short amount of time. SAS could do better to help users on this matter by 1. Changing the function name from Lag() to Queue() or something.
2. Doing better explanation job in their Docs, at least for the perspective of new users.
I completely agree what Data _null_ has just said in his post, here I will give you a short example in a hope to help you understand this better:
Here is the code, very simple, only focusing on the situation of LAG1, while LAGn() is pretty much the same concept.
data lag_test;
set sashelp.class(obs=6 keep=Name);
Uncon_queue=lag(name);
if mod(_n_,2)=0 then
Con_queue=lag(name);
run;
Here is the results:
Explanation:
First, I want you to picture a queue, where LAG() Push value IN and Pull value OUT, I don't know where it sits, but I suspect it is somewhere in the memory, and its size will be limited by that factor.
Second, remember that when LAG() executed, it will Pull out the last (most recent) in the queue , and Push in another one (the one in PDV) .(if LAG2(), it will pull out the Second Last in the queue, if LAG3(), it will pull the Third Last in the queue, so on so forth. But in term of Push-in, they all behave the same.)
For the Uncon_queue,
_n_=1, LAG() executed, since this is the first time it executed, there is no values in the queue. So Pull_out(output)=MISSING, Push_in=Alfred;
_n_=2, LAG() executed, Pull_out(output)=Alfred, Push_in=Alice
_n_=3, LAG() executed, Pull_out(output)=Alice, Push_in=Barbara
_n_=4, so on so forth.
So the content of this queue is: everyone from Alfred to James, it is consistent with a picture of 'lagging' observations.
For the Con_queue, very SAME thing happens, keep in mind LAG() only executed when _n_ is a even number.
_n_=1, LAG() NOT executed, so output=missing
_n_=2, LAG() executed, since this is the first time it executed, there is no values in the queue. So Pull_out(output)=MISSING, Push_in=Alice, the current on in PDV.
_n_=3, LAG() NOT executed, so output=missing
_n_=4, LAG() executed, Pull_out(output)=Alice, the last in Queue, Push_in=Carol, the current one in PDV
_n_=5, LAG() NOT executed, so output=missing
_n_=6, LAG() executed, Pull_out(output)=Carol, the last in Queue, Push_in=James, the current one in PDV
So the content of this queue is: Alice, Carol, James, where their _n_ is a even number, Obviously it is NOT consistent with picture of 'lagging' observations.
But if you take this "Queue" concept into play, LAG() is predictable and consistent, like I said before, whenever it executes, it dumps in the current one in the PDV, at the same time, pull the last one in the queue out, missing or non-missing, conditional or non-conditional.
In the real life, to make new user life easier, they are often taught not to use LAG() conditionally, because unconditionally using LAG() will be consistent with the layman concept of LAG, that is all of the obs will be put into the queue, one after another, nicely. And fortunately, most of the scenarios does not require to LAG under any condition.
But if you do have to use it conditionally, picture it as being pulling the last one from a queue, which is entirely different from pulling the previous observation.
HTH,
Haikuo
Hi Hai.Kuo and data_null_,
Thank you so much for your response!
I am at awe how helpful and knowledgeable you are and I love SAS community for this precise reason!!
I hope you are having great weekend and thanks again!!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.