BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
blue_lion
Calcite | Level 5

Hi

I have a test data with missing values and I want to fill the table by taking previous month's value.

Test data

monthABC
Jan123.3
Feb4.56
Mar3..
Apr534
May.62

data I want to have

monthABC
Jan123.3
Feb4256
Mar3256
Apr534
May562

Here's the code I wrote, but it gives me the exact original table with missing data and I do not get an error message in the log.

I wonder why the lagged values are not applied according to my if statement logic..

I appreciate if someone could help me shed a light on this.

data test;

input month $ A B C;

datalines;

Jan 1 2 3.3

Feb 4 . 56

Mar 3 . .

Apr 5 3 4

May . 6 2

;

run;

data no_missing(drop=i);

set test;

array ticker(*)A--C;

do i=1 to dim(ticker);

if ticker{i}=. then ticker{i}=lag(ticker{i});

else ticker{i}=ticker{i};

end;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

LAG it not the proper tool for LOCF.  I like the UPDATE trick for your data.

data abc;
   input month $ A  B  C;
   cards;
Jan   1  2  3.3
Feb   4  .  56
Mar   3  .  .
Apr   5  3  4
May   .  6  2
;;;;
   run;
data abcV / view=abcv;
   retain dummy 1;
  
set abc;
   run;
data locf;
   update abcv(obs=0) abcv;
   by dummy;
   output;
  
drop dummy;
   run;
proc print;
  
run;

2-27-2015 10-07-39 AM.png

View solution in original post

9 REPLIES 9
blue_lion
Calcite | Level 5

I just recalled lag function in a conditional statement always returns missing value...

That's why I was getting the identical original table after running code..

but I appreciate if anyone can let me know how to code this without using lag function..

data_null__
Jade | Level 19

LAG it not the proper tool for LOCF.  I like the UPDATE trick for your data.

data abc;
   input month $ A  B  C;
   cards;
Jan   1  2  3.3
Feb   4  .  56
Mar   3  .  .
Apr   5  3  4
May   .  6  2
;;;;
   run;
data abcV / view=abcv;
   retain dummy 1;
  
set abc;
   run;
data locf;
   update abcv(obs=0) abcv;
   by dummy;
   output;
  
drop dummy;
   run;
proc print;
  
run;

2-27-2015 10-07-39 AM.png
blue_lion
Calcite | Level 5

Hi data_null,

Thank you so much for your quick response!

I came up with my revised code, but this only works if I know how many consecutive rows will be missing in advance.

Your code is much better!! Thanks again!

data no_missing(drop=i);

set test;

array ticker(*)A--C;

do i=1 to dim(ticker);

lag=lag(ticker{i});

lag=lag2(ticker{i});

if ticker{i}=. then ticker{i}=coalesce(lag,lag2);

else if ticker{i}=ticker{i};

end;

run;

Haikuo
Onyx | Level 15

Like the VIEW! Learn some!

I used to hate adding another dummy variable in this case, but not anymore with the view. Thanks for sharing, John.

Haikuo
Onyx | Level 15

Quote: "I just recalled lag function in a conditional statement always returns missing value.."

Wrong. The LAG() will return next value in the queue, missing or non-missing, conditional or non-conditional. Please read the manual.

Just like you, I like the options from data _null_ very much. Here is another alternative that maybe closer to what you are familiar with:

data test;

     input month $ A B C;

     datalines;

Jan 1 2 3.3

Feb 4 . 56

Mar 3 . .

Apr 5 3 4

May . 6 2

;

run;

data want;

     set test;

     array ticker(3) a--c;

     array _ticker(3) _temporary_;

     do i=1 to 3;

           _ticker(i)=coalesce(ticker(i), _ticker(i));

           ticker(i)=_ticker(i);

     end;

     drop i;

run;

blue_lion
Calcite | Level 5

Hi Hai.Kuo,

Thanks for your response and pointing this out!

In below doc, I read "Since LAG is called in conditional code, the value it returned is not the value from the previous observation, but is the value of its previous execution."

I also read "However, when it is executed conditionally, the LAG function only retrieves values from observations for which the condition is satisfied."

If you have any other good article on this topic, please let me know.

http://support.sas.com/resources/papers/proceedings09/055-2009.pdf

data_null__
Jade | Level 19

I think both of those statement are saying essentially the same thing. I think it is best to leave the word observation(s) out of the conversation all together.

The LAG function retrieves the value from the last time it was executed.  Or in the case of LAGn the nth time it was executed. 

Haikuo
Onyx | Level 15

I know it is confusing, and trust me, I have been there and stuck there for a not-short amount of time. SAS could do better to help users on this matter by 1. Changing the function name from Lag() to Queue() or something.

2. Doing better explanation job in their Docs, at least for the perspective of new users.

I completely agree what Data _null_ has just said in his post, here I will give you a short example in a hope to help you understand this better:

Here is the code, very simple, only focusing on the situation of LAG1, while LAGn() is pretty much the same concept.

data lag_test;

     set sashelp.class(obs=6 keep=Name);

     Uncon_queue=lag(name);

     if mod(_n_,2)=0 then

           Con_queue=lag(name);

run;


Here is the results:

lag.PNG

Explanation:

First, I want you to picture a queue, where LAG() Push value  IN and Pull value OUT, I don't know where it sits, but I suspect it is somewhere in the memory, and its size will be limited by that factor.

Second, remember that when LAG() executed, it will Pull out the last (most recent) in the queue , and Push in another one (the one in PDV) .(if LAG2(), it will pull out the Second Last in the queue, if LAG3(), it will pull the Third Last in the queue, so on so forth. But in term of Push-in, they all behave the same.)

For the Uncon_queue,

_n_=1, LAG() executed, since this is the first time it executed, there is no values in the queue. So Pull_out(output)=MISSING, Push_in=Alfred;

_n_=2, LAG() executed, Pull_out(output)=Alfred, Push_in=Alice

_n_=3, LAG() executed, Pull_out(output)=Alice, Push_in=Barbara

_n_=4, so on so forth.

So the content of this queue is: everyone from Alfred to James, it is consistent with a picture of 'lagging' observations.

For the Con_queue, very SAME thing happens, keep in mind LAG() only executed when _n_ is a even number.

_n_=1, LAG() NOT executed, so output=missing

_n_=2, LAG() executed, since this is the first time it executed, there is no values in the queue. So Pull_out(output)=MISSING, Push_in=Alice, the current on in PDV.

_n_=3, LAG() NOT executed, so output=missing

_n_=4, LAG() executed, Pull_out(output)=Alice, the last in Queue, Push_in=Carol, the current one in PDV

_n_=5, LAG() NOT executed, so output=missing

_n_=6, LAG() executed, Pull_out(output)=Carol, the last in Queue, Push_in=James, the current one in PDV

So the content of this queue is: Alice, Carol, James, where their _n_ is a even number, Obviously it is NOT consistent with picture of 'lagging' observations.

But if you take this "Queue" concept into play, LAG() is predictable and consistent, like I said before, whenever it executes, it dumps in the current one in the PDV, at the same time, pull the last one in the queue out, missing or non-missing, conditional or non-conditional.

In the real life, to make new user life easier, they are often taught not to use LAG() conditionally, because unconditionally using LAG() will be consistent with the layman concept of LAG, that is all of the obs will be put into the queue, one after another, nicely. And fortunately, most of the scenarios does not require to LAG under any condition.

But if you do have to use it conditionally, picture it as being pulling the last one from a queue, which is entirely different from pulling the previous observation.


HTH,

Haikuo


blue_lion
Calcite | Level 5

Hi Hai.Kuo and data_null_,

Thank you so much for your response!

I am at awe how helpful and knowledgeable you are and I love SAS community for this precise reason!!

I hope you are having great weekend and thanks again!! Smiley Happy

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2328 views
  • 8 likes
  • 3 in conversation