hi, I have something like below:
data data;
infile datalines delimiter=' ';
input num total N N1;
datalines;
633 3000 2367 2367
450 3000 . .
22 3000 . .
11 3000 . .
10 3000 . .
9 3000 . .
4 3000 . .
5 3000 . .
1 3000 . .
5 3000 . .
;
This should be really easy but I'm not so good with SAS.. I simply want to populate the next rows by the difference between lagged N (or N1) and num... so I'd want 2367, 1734, 1284, etc.
I'm trying N1=lag(N1)-lag(num) but I get this error:
I'm not sure what you want to do exactly. I suppose your issue is that when you the lag function the variable in the lag function has to be a variable from the dataset in the set statement. I suppose that is the cause for the missing values message.
Try to run the code below and see if that makes sense to you:
data have;
infile datalines delimiter=' ';
input num total N N1;
datalines;
633 3000 2367 2367
450 3000 . .
22 3000 . .
11 3000 . .
10 3000 . .
9 3000 . .
4 3000 . .
5 3000 . .
1 3000 . .
5 3000 . .
;
data want;
set have;
retain help;
if _n_ eq 1 then do; N1new = N1; help = N1; end;
lagnum = lag(num);
if lagnum ne . then do; N1new = help - lagnum; help = N1new; end;
run;
Best regards, Jos
Hello,
I cannot figure out what you want.
Do you want something like the below? See WANT dataset produced.
data have;
infile datalines delimiter=' ';
input num total N N1;
numrow=_N_;
datalines;
633 3000 2367 2367
450 3000 . .
22 3000 . .
11 3000 . .
10 3000 . .
9 3000 . .
4 3000 . .
5 3000 . .
1 3000 . .
5 3000 . .
;
run;
proc timedata data=have out=_NULL_ OUTARRAY=WANT(drop=numrow _season_)
print=(scalars arrays);
id numrow interval=day acc=total format=date9.;
vars num total;
outarrays MyN;
do t = 1 to dim(num);
if t=1 then do; MyN[t] = total[t] - num[t]; end;
else do; MyN[t] = MyN[t-1] - num[t]; end;
end;
run;
/* end of program */
Thanks,
Koen
I'm not sure what you want to do exactly. I suppose your issue is that when you the lag function the variable in the lag function has to be a variable from the dataset in the set statement. I suppose that is the cause for the missing values message.
Try to run the code below and see if that makes sense to you:
data have;
infile datalines delimiter=' ';
input num total N N1;
datalines;
633 3000 2367 2367
450 3000 . .
22 3000 . .
11 3000 . .
10 3000 . .
9 3000 . .
4 3000 . .
5 3000 . .
1 3000 . .
5 3000 . .
;
data want;
set have;
retain help;
if _n_ eq 1 then do; N1new = N1; help = N1; end;
lagnum = lag(num);
if lagnum ne . then do; N1new = help - lagnum; help = N1new; end;
run;
Best regards, Jos
thanks that worked! I guess if I understand correctly you must make the new lagged variable in the datastep
@richart wrote:
thanks that worked! I guess if I understand correctly you must make the new lagged variable in the datastep
There is a difference between retaining a variable (which just means it is not set to missing at the start of each iteration) and the LAG() function.
The LAG() function basically just builds a stack/queue of values. Each time it executes the current value of its argument is pushed onto the stack and the request lagged instance is returned as the result.
To see what is happening for your simple data step adds some PUT or PUTLOG statement at different places in your data step to see how the values are changing.
data want;
put (_n_ num total N N1 n1new help lagnum) (=);
set have;
put (_n_ num total N N1 n1new help lagnum) (=);
retain help;
if _n_ eq 1 then do; N1new = N1; help = N1; end;
put (_n_ num total N N1 n1new help lagnum) (=);
lagnum = lag(num);
put (_n_ num total N N1 n1new help lagnum) (=);
if lagnum ne . then do; N1new = help - lagnum; help = N1new; end;
put (_n_ num total N N1 n1new help lagnum) (=);
run;
The first time your LAG() function call runs the result is always a missing because it has not yet saved any values from the previous times it ran.
Do not use addition with missing values, the result is always a missing value. The SUM() function will ignore the missing values and just add up the non-missing values.
You do NOT want the NEW variables to already exist in the input dataset. When you read the next observation with the SET statement it will retrieve those missing values and overwrite anything you might have tried to calculate.
So you have this series of NUM values.
data have;
input num @@;
cards;
633 450 22 11 10 9 4 5 1 5
;
And some initial value for the NEW variable.
Perhaps you want 3,000 as the initial value?
data want;
set have ;
retain n1 3000;
n1 = sum(n1,-lag(num));
run;
Results
Obs num n1 1 633 3000 2 450 2367 3 22 1917 4 11 1895 5 10 1884 6 9 1874 7 4 1865 8 5 1861 9 1 1856 10 5 1855
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.