## Array function for Area under the Curve with multiple variables?

I'm calculating the area under the curve for time series measurements for n individuals and I want to do this for more than ten different variables:

data want; set have; by id;

var1area+dif(date)*mean(var1,lag(var1));if first.id then var1area=0;

var2area+dif(date)*mean(var2,lag(var1));if first.id then var2area=0;

...;

run;

It runs well, however, when I tried to put this into an array function it didn't work most likely because of the "if first.id then _var(i)area =0. All columns except the first one will give me 0 in every observation.

Does anyone have a solution?

1 ACCEPTED SOLUTION

Accepted Solutions

## Re: Array function for Area under the Curve with multiple variables?

You're welcome. Glad to see that my code worked for your data.

Note that the new variable COUNT is not needed. The only reason for my ifn(_n_>1, ...) construct was to avoid the unnecessary note "Missing values were generated ..." in the log. This note would occur during the AUC calculation due to the missing value returned by the DIF function when it is called for the first time, i.e., when _n_=1. (I should have explained this.) The results in dataset WANT would have been the same, had I just written

`d=dif(date);`

The value of variable d in the first observation of an ID, be it missing or incorrect, does not affect the results anyway because

`if first.id then AUC[i]=0;`

and the variable is dropped.

Your calculation of _avevar(i) may cause unnecessary notes in the log (such as "Division by zero detected ..." or "Mathematical operations could not be performed ..."). I think you could avoid them (without changing the results) by using conditional assignment statements:

```if first.id then _avevar(i)=_var(i);
else _avevar(i)=_area(i)/timelag;```
5 REPLIES 5

## Re: Array function for Area under the Curve with multiple variables?

If I am understanding you properly (and I may not be)

You want an ARRAY statement, such as

``````array vararea var1area var2area ... ; /* You type the rest */
array var var1 var2 ... ; /* You type the rest */``````

and then call the arrays via

``vararea(i)+dif(date)*mean(var(i),lag(var(i)));if first.id then vararea(i)=0;``

BUT — IMPORTANT — make your life simpler, don't name variables var1area var2area ... name them vararea1, vararea2, ... or even better yet area1, area2, ... with numeric suffixes (not the number inside the variable name like var1area)

In which case everything is simpler!! Then you can use SAS variable lists, such as below:

``````array area area1-area10;
array var var1-var10;

/* Call the array */

area(i)+dif(date)*mean(var(i),lag(var(i)));if first.id then area(i)=0;``````

--
Paige Miller

## Re: Array function for Area under the Curve with multiple variables?

Hello @donanlb and welcome to the SAS Support Communities!

The problem is that dif(date)=0 starting from the second iteration of the loop because the date doesn't change. Assign dif(date) to a variable before the loop and then use that variable in the loop.

Example:

```data want(drop=i d);
array var;
array AUC;
set have;
by id;
d=ifn(_n_>1,dif(date),0);
do i=1 to dim(var);
AUC[i]+d*mean(var[i],lag(var[i]));
if first.id then AUC[i]=0;
end;
*if last.id;
*drop var: date;
run;```

## Re: Array function for Area under the Curve with multiple variables?

Thanks a lot - that worked!

I additionally inserted an id count before the suggested variable - otherwise it would always have calculated the difference from the start of the sheet instead of from the start of each id. In the end I inserted another array to calculate the time-averaged mean by using the timelag variable that I had already calculated before for each id from the first to the individual observation.

This will save me a lot of time during the next days 🙂

Here's the code:

``````data want ;set have;count+1;by id;
if first.id then count=1;
d=ifn(count>1,dif(date),0);
array _var(*) var1-var17;
array _area(*) area1-area17;
array _avevar(*) avevar1-avevar17;
do i=1 to dim(_var);
_area(i)+d*mean(_var(i),lag(_var(i)));if first.id then _area(i)=0;
_avevar(i)=_area(i)/timelag;
if _avevar(i)=. then _avevar(i)=_var(i);
end;
run;``````

## Re: Array function for Area under the Curve with multiple variables?

You're welcome. Glad to see that my code worked for your data.

Note that the new variable COUNT is not needed. The only reason for my ifn(_n_>1, ...) construct was to avoid the unnecessary note "Missing values were generated ..." in the log. This note would occur during the AUC calculation due to the missing value returned by the DIF function when it is called for the first time, i.e., when _n_=1. (I should have explained this.) The results in dataset WANT would have been the same, had I just written

`d=dif(date);`

The value of variable d in the first observation of an ID, be it missing or incorrect, does not affect the results anyway because

`if first.id then AUC[i]=0;`

and the variable is dropped.

Your calculation of _avevar(i) may cause unnecessary notes in the log (such as "Division by zero detected ..." or "Mathematical operations could not be performed ..."). I think you could avoid them (without changing the results) by using conditional assignment statements:

```if first.id then _avevar(i)=_var(i);
else _avevar(i)=_area(i)/timelag;```

## Re: Array function for Area under the Curve with multiple variables?

Thank you for clarifying!
Discussion stats
• 5 replies
• 264 views
• 3 likes
• 3 in conversation