## Can someone help me with this lag function, where only the first row is known?

hi, I have something like below:

``````data data;
infile datalines delimiter='	';
input num	total	N	N1;
datalines;
633	3000	2367	2367
450	3000	.	.
22	3000	.	.
11	3000	.	.
10	3000	.	.
9	3000	.	.
4	3000	.	.
5	3000	.	.
1	3000	.	.
5	3000	.	.

;``````

This should be really easy but I'm not so good with SAS.. I simply want to populate the next rows by the difference between lagged N (or N1) and num... so I'd want 2367, 1734, 1284, etc.

I'm trying N1=lag(N1)-lag(num) but I get this error:

Missing values were generated as a result of performing an operation on
missing values.

Any help would be really appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions

## Re: Can someone help me with this lag function, where only the first row is known?

I'm not sure what you want to do exactly. I suppose your issue is that when you the lag function the variable in the lag function has to be a variable from the dataset in the set statement. I suppose that is the cause for the missing values message.

Try to run the code below and see if that makes sense to you:

``````data have;
infile datalines delimiter='	';
input num	total	N	N1;
datalines;
633	3000	2367	2367
450	3000	.	.
22	3000	.	.
11	3000	.	.
10	3000	.	.
9	3000	.	.
4	3000	.	.
5	3000	.	.
1	3000	.	.
5	3000	.	.
;

data want;
set have;
retain help;
if _n_ eq 1 then do; N1new = N1; help = N1; end;
lagnum = lag(num);
if lagnum ne . then do; N1new = help - lagnum; help = N1new; end;
run;``````

Best regards, Jos

5 REPLIES 5

## Re: Can someone help me with this lag function, where only the first row is known?

Hello,

I cannot figure out what you want.

Do you want something like the below? See WANT dataset produced.

``````data have;
infile datalines delimiter='	';
input num	total	N	N1;
numrow=_N_;
datalines;
633	3000	2367	2367
450	3000	.	.
22	3000	.	.
11	3000	.	.
10	3000	.	.
9	3000	.	.
4	3000	.	.
5	3000	.	.
1	3000	.	.
5	3000	.	.
;
run;

proc timedata data=have out=_NULL_ OUTARRAY=WANT(drop=numrow _season_)
print=(scalars arrays);
id numrow interval=day acc=total format=date9.;
vars num	total;
outarrays MyN;

do t = 1 to dim(num);
if t=1 then do; MyN[t] = total[t] - num[t]; end;
else        do; MyN[t] = MyN[t-1] - num[t]; end;
end;
run;
/* end of program */``````

Thanks,

Koen

## Re: Can someone help me with this lag function, where only the first row is known?

I'm not sure what you want to do exactly. I suppose your issue is that when you the lag function the variable in the lag function has to be a variable from the dataset in the set statement. I suppose that is the cause for the missing values message.

Try to run the code below and see if that makes sense to you:

``````data have;
infile datalines delimiter='	';
input num	total	N	N1;
datalines;
633	3000	2367	2367
450	3000	.	.
22	3000	.	.
11	3000	.	.
10	3000	.	.
9	3000	.	.
4	3000	.	.
5	3000	.	.
1	3000	.	.
5	3000	.	.
;

data want;
set have;
retain help;
if _n_ eq 1 then do; N1new = N1; help = N1; end;
lagnum = lag(num);
if lagnum ne . then do; N1new = help - lagnum; help = N1new; end;
run;``````

Best regards, Jos

## Re: Can someone help me with this lag function, where only the first row is known?

thanks that worked! I guess if I understand correctly you must make the new lagged variable in the datastep

## Re: Can someone help me with this lag function, where only the first row is known?

@richart wrote:

thanks that worked! I guess if I understand correctly you must make the new lagged variable in the datastep

There is a difference between retaining a variable (which just means it is not set to missing at the start of each iteration) and the LAG() function.

The LAG() function basically just builds a stack/queue of values. Each time it executes the current value of its argument is pushed onto the stack and the request lagged instance is returned as the result.

To see what is happening for your simple data step adds some PUT or PUTLOG statement at different places in your data step to see how the values are changing.

``````data want;
put (_n_ num total N N1 n1new help lagnum) (=);
set have;
put (_n_ num total N N1 n1new help lagnum) (=);
retain help;
if _n_ eq 1 then do; N1new = N1; help = N1; end;
put (_n_ num total N N1 n1new help lagnum) (=);
lagnum = lag(num);
put (_n_ num total N N1 n1new help lagnum) (=);
if lagnum ne . then do; N1new = help - lagnum; help = N1new; end;
put (_n_ num total N N1 n1new help lagnum) (=);
run;``````

## Re: Can someone help me with this lag function, where only the first row is known?

The first time your LAG() function call runs the result is always a missing because it has not yet saved any values from the previous times it ran.

Do not use addition with missing values, the result is always a missing value.  The SUM() function will ignore the missing values and just add up the non-missing values.

You do NOT want the NEW variables to already exist in the input dataset. When you read the next observation with the SET statement it will retrieve those missing values and overwrite anything you might have tried to calculate.

So you have this series of NUM values.

``````data have;
input num @@;
cards;
633 450 22 11 10 9 4 5 1 5
;``````

And some initial value for the NEW variable.

Perhaps you want 3,000 as the initial value?

``````data want;
set have ;
retain n1 3000;
n1 = sum(n1,-lag(num));
run;``````

Results

```Obs    num     n1

1    633    3000
2    450    2367
3     22    1917
4     11    1895
5     10    1884
6      9    1874
7      4    1865
8      5    1861
9      1    1856
10      5    1855
```
Discussion stats
• 5 replies
• 337 views
• 3 likes
• 4 in conversation