Calcite | Level 5

LAG not taking the value

So, I have a dataset

`DATA CLASS; INPUT ONE 1-2 TWO 3-4 COUNT 5-20;datalines;0 0 301 0 5412 0 15253 0 50 1 31 1 2292 1 6423 1 3;run;`

And I want to convert count to a Lower bound and an upper bound, using some pretty simple rules:

if (ONE=0) then LB=0

else LB=LAG(UB)+1

UB=LB+count -1

so, in SAS I programmed this:

data class2;

set class;

LB=LAG(UB)+1;
if (ONE=0) then LB=0;

UB =LB+count;

run;

But this just gives me

ONE TWO COUNT LB UB LB2
 0 0 30 0 30 . 1 0 541 . . . 2 0 1525 . . . 3 0 5 . . . 0 1 3 0 3 . 1 1 229 . . . 2 1 642 . . . 3 1 3 . . .

what is going on?

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

Re: LAG not taking the value

@Diemo wrote:

What I expect to end up with is a data set as follows:

ONE TWO COUNT LB UB

 0 0 30 0 30 1 0 541 31 572 2 0 1525 573 2098 3 0 5 2099 2104 0 1 3 0 3 1 1 229 4 331 2 1 642 332 974 3 1 3 975 978

I was under the impression that the lines were sent in sequentially - i.e. SAS would do line 1, then Line2 , then Line3, etc.  So, on my first iteration, LAG(UB) returns missing (as UB is all missing). But ONE==0 so therefore LB=0, then UB=LB(0) +count = 30 (which is what I get. On the second iteration, LB=LAG(UB)=30, and UB is then 31+541 = 572. And so on.

And on all the other interations it will return the missing value pushed down from the LAG() call on the previous iteration.  So LB will be set to missing.

I guess that this is where the issue is.   Why is it that LAG is not updating to the last value but keeping the missing values, even though the value of UB is changed?

The statements execute in order and do what they are designed to do.  The LAG() function call on the first assignment statement cannot look into the future and know what value is going to be assigning to the UB variable two statements later!!  LAG() just returns the value that the variable had the last time that specific LAG() function call ran.

You don't want LAG() you just want to RETAIN the value from the previous observation.

``````data class2;
set class;
lb=ub+1;
if (ONE=0) then do;
LB=0; ub=0;
end;
UB+count;
run;``````
```Obs    ONE    TWO    COUNT      lb      ub

1      0      0        30       0      30
2      1      0       541      31     571
3      2      0      1525     572    2096
4      3      0         5    2097    2101
5      0      1         3       0       3
6      1      1       229       4     232
7      2      1       642     233     874
8      3      1         3     875     877```
8 REPLIES 8
Super User

Re: LAG not taking the value

Code doesn't match the output, for example there's no LB2 mentioned anywhere in your code or source data.

So what do you have, and what do you expect and the exact code you tried please.

@Diemo wrote:

So, I have a dataset

`DATA CLASS; INPUT ONE 1-2 TWO 3-4 COUNT 5-20;datalines;0 0 301 0 5412 0 15253 0 50 1 31 1 2292 1 6423 1 3;run;`

And I want to convert count to a Lower bound and an upper bound, using some pretty simple rules:

if (ONE=0) then LB=0

else LB=LAG(UB)+1

UB=LB+count -1

so, in SAS I programmed this:

data class2;

set class;

LB=LAG(UB)+1;
if (ONE=0) then LB=0;

UB =LB+count;

run;

But this just gives me

ONE TWO COUNT LB UB LB2
 0 0 30 0 30 . 1 0 541 . . . 2 0 1525 . . . 3 0 5 . . . 0 1 3 0 3 . 1 1 229 . . . 2 1 642 . . . 3 1 3 . . .

what is going on?

Calcite | Level 5

Re: LAG not taking the value

Yeah you can ignore LB2. It is just a check to see what LAG(UB) was actually giving me (all missing values).

data class2;

set class;

LB=LAG(UB)+1;

LB2=LAG(UB);

if (ONE=0) then LB=0;

UB =LB+count;

run;

Super User

Re: LAG not taking the value

And what did you expect those statements to do?  Let's look at them.  First you make a new data step that reads from your original dataset.

``````data class2;
set class;``````

Then in the next statement.

``LB=LAG(UB)+1;``

You reference two new variables,  LB and UB.  Since LB and UB were not defined before SAS will make them numeric and give them missing values.  The LAG() function will push the current value onto a stack and return the previous value from the stack. On the first iteration the returned value will be missing since nothing has yet been pushed down. And on all the other interations it will return the missing value pushed down from the LAG() call on the previous iteration.  So LB will be set to missing.

Now on the next statement

``if (ONE=0) then LB=0;``

you conditionally set LB to zero.  So on any observations where ONE = 1 then LB will be 0.

Then

``UB =LB+count;``

you add the variable COUNT to the variable LB and store it into UB.  But since LB is missing for any observation where ONE is not equal to 1 this will force LB to be missing also.

The SAS log should show you where you are generating missing values.

```28   data class2;
29     set class;
30     LB=LAG(UB)+1;
31     if (ONE=0) then LB=0;
32     UB =LB+count;
33   run;

NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
8 at 30:13   6 at 32:9
NOTE: There were 8 observations read from the data set WORK.CLASS.
NOTE: The data set WORK.CLASS2 has 8 observations and 5 variables.```

What are you trying to find? The min() and max() of count?  The SUM() of count?

Calcite | Level 5

Re: LAG not taking the value

What I expect to end up with is a data set as follows:

ONE TWO COUNT LB UB

 0 0 30 0 30 1 0 541 31 572 2 0 1525 573 2098 3 0 5 2099 2104 0 1 3 0 3 1 1 229 4 331 2 1 642 332 974 3 1 3 975 978

I was under the impression that the lines were sent in sequentially - i.e. SAS would do line 1, then Line2 , then Line3, etc.  So, on my first iteration, LAG(UB) returns missing (as UB is all missing). But ONE==0 so therefore LB=0, then UB=LB(0) +count = 30 (which is what I get. On the second iteration, LB=LAG(UB)=30, and UB is then 31+541 = 572. And so on.

And on all the other interations it will return the missing value pushed down from the LAG() call on the previous iteration.  So LB will be set to missing.

I guess that this is where the issue is.   Why is it that LAG is not updating to the last value but keeping the missing values, even though the value of UB is changed?

Super User

Re: LAG not taking the value

@Diemo wrote:

What I expect to end up with is a data set as follows:

ONE TWO COUNT LB UB

 0 0 30 0 30 1 0 541 31 572 2 0 1525 573 2098 3 0 5 2099 2104 0 1 3 0 3 1 1 229 4 331 2 1 642 332 974 3 1 3 975 978

I was under the impression that the lines were sent in sequentially - i.e. SAS would do line 1, then Line2 , then Line3, etc.  So, on my first iteration, LAG(UB) returns missing (as UB is all missing). But ONE==0 so therefore LB=0, then UB=LB(0) +count = 30 (which is what I get. On the second iteration, LB=LAG(UB)=30, and UB is then 31+541 = 572. And so on.

And on all the other interations it will return the missing value pushed down from the LAG() call on the previous iteration.  So LB will be set to missing.

I guess that this is where the issue is.   Why is it that LAG is not updating to the last value but keeping the missing values, even though the value of UB is changed?

The statements execute in order and do what they are designed to do.  The LAG() function call on the first assignment statement cannot look into the future and know what value is going to be assigning to the UB variable two statements later!!  LAG() just returns the value that the variable had the last time that specific LAG() function call ran.

You don't want LAG() you just want to RETAIN the value from the previous observation.

``````data class2;
set class;
lb=ub+1;
if (ONE=0) then do;
LB=0; ub=0;
end;
UB+count;
run;``````
```Obs    ONE    TWO    COUNT      lb      ub

1      0      0        30       0      30
2      1      0       541      31     571
3      2      0      1525     572    2096
4      3      0         5    2097    2101
5      0      1         3       0       3
6      1      1       229       4     232
7      2      1       642     233     874
8      3      1         3     875     877```
Calcite | Level 5

Re: LAG not taking the value

Hmm, ok. So what does LAG() actually do (I was under the impression that LAG() was how you retained a value for future processing, but that seems to be incorrect.
Super User

Re: LAG not taking the value

It returns the previous value that it stored the last time it executed.  It has nothing to do with previous observations.  Just previous times that you executed the function. And if you have more than one LAG() function in the same dataset step they each keep their own stack of values, even if they are referencing (storing/retrieving) the same variable.

Consider it like a stack of trays in a cafeteria.  When you walk in you take the top tray off the stack and place the tray you came in with onto the stack.  Then the next time you do the same thing so you take out the tray you put in the previous time and put in the tray you currently have.

Super User

Re: LAG not taking the value

@Diemo wrote:
Hmm, ok. So what does LAG() actually do (I was under the impression that LAG() was how you retained a value for future processing, but that seems to be incorrect.

That would be RETAIN

Discussion stats
• 8 replies
• 2609 views
• 0 likes
• 3 in conversation