## Negative values when using lag function to calculate the area under curve

Solved
Occasional Contributor
Posts: 12

# Negative values when using lag function to calculate the area under curve

Please see below for the code I used to calculate AUC. The code works fine and the AUC calculation is accurate. The issue is however, I was expecting a "." for the first observation of each ID. However, from the second ID, the first observation of each ID returns a negative value instead of missing ("."). Does anyone know how to fix this issue without me having to give this statement 'if auc<0 then auc=.' ? Thank you so much.

SAS Code I used:

data newdata.trial1ab;
set check4;
retain StudyID_Old;
if (_N_ = 1) then StudyID_Old = study_ID;
if (study_ID ne StudyID_Old) then
do;
x_1=.;
y_1 = .;
StudyID_Old = study_ID;
end;
x = dep_new_days;
y = Mean_Hf;
xx = x;
lny = log(y);
x_1 = lag(x);
y_1 = lag(y);
x_xx = x - x_1;
cc = ( y + y_1 ) / 2;
auc = cc * x_xx;
run;

Accepted Solutions
Solution
‎12-18-2017 01:59 PM
Posts: 1,309

## Re: Negative values when using lag function to calculate the area under curve

Because the lag function is a queue  update function, not a "lookback", you need to update the queue with every observations, but use  a missing value when at the beginning of a by  group.   But your code only updates the queue when not at the start of a new id.

You need something like:

data want;

set have;

by studyid;

x1=ifn(first.studyid,.,lag(x);

y1=ifn(first.studyid,.,lag(y);

The ifn function will always update the lag queue, but  will return a dot when at the start of an id.

You should  be able to take it from there.

All Replies
Solution
‎12-18-2017 01:59 PM
Posts: 1,309

## Re: Negative values when using lag function to calculate the area under curve

Because the lag function is a queue  update function, not a "lookback", you need to update the queue with every observations, but use  a missing value when at the beginning of a by  group.   But your code only updates the queue when not at the start of a new id.

You need something like:

data want;

set have;

by studyid;

x1=ifn(first.studyid,.,lag(x);

y1=ifn(first.studyid,.,lag(y);

The ifn function will always update the lag queue, but  will return a dot when at the start of an id.

You should  be able to take it from there.

Occasional Contributor
Posts: 12

Thank you!
Super User
Posts: 7,932

## Re: Negative values when using lag function to calculate the area under curve

You are very close, but pay attention to the order of  your statements in the data step.

You need to calculate the lagged values, then make the decision of whether they are valid or not because of by group transition.

``````data newdata.trial1ab;
set check4;
retain StudyID_Old;
if (_N_ = 1) then StudyID_Old = study_ID;
x = dep_new_days;
y = Mean_Hf;
xx = x;
lny = log(y);
x_1 = lag(x);
y_1 = lag(y);
if (study_ID ne StudyID_Old) then do;
x_1=.;
y_1 = .;
StudyID_Old = study_ID;
end;
else do;
x_xx = x - x_1;
cc = ( y + y_1 ) / 2;
auc = cc * x_xx;
end;
run;``````
☑ This topic is solved.