BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
donanlb
Fluorite | Level 6

I'm calculating the area under the curve for time series measurements for n individuals and I want to do this for more than ten different variables:

 

data want; set have; by id;

var1area+dif(date)*mean(var1,lag(var1));if first.id then var1area=0;

var2area+dif(date)*mean(var2,lag(var1));if first.id then var2area=0;

...;

run;

 

It runs well, however, when I tried to put this into an array function it didn't work most likely because of the "if first.id then _var(i)area =0. All columns except the first one will give me 0 in every observation.

Does anyone have a solution?

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

You're welcome. Glad to see that my code worked for your data.

 

Note that the new variable COUNT is not needed. The only reason for my ifn(_n_>1, ...) construct was to avoid the unnecessary note "Missing values were generated ..." in the log. This note would occur during the AUC calculation due to the missing value returned by the DIF function when it is called for the first time, i.e., when _n_=1. (I should have explained this.) The results in dataset WANT would have been the same, had I just written 

d=dif(date);

The value of variable d in the first observation of an ID, be it missing or incorrect, does not affect the results anyway because 

if first.id then AUC[i]=0;

and the variable is dropped.

 

Your calculation of _avevar(i) may cause unnecessary notes in the log (such as "Division by zero detected ..." or "Mathematical operations could not be performed ..."). I think you could avoid them (without changing the results) by using conditional assignment statements:

if first.id then _avevar(i)=_var(i);
else _avevar(i)=_area(i)/timelag;

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

If I am understanding you properly (and I may not be)

 

You want an ARRAY statement, such as

 

array vararea var1area var2area ... ; /* You type the rest */
array var var1 var2 ... ; /* You type the rest */

 

and then call the arrays via

 

vararea(i)+dif(date)*mean(var(i),lag(var(i)));if first.id then vararea(i)=0;

 

BUT — IMPORTANT — make your life simpler, don't name variables var1area var2area ... name them vararea1, vararea2, ... or even better yet area1, area2, ... with numeric suffixes (not the number inside the variable name like var1area)

 

In which case everything is simpler!! Then you can use SAS variable lists, such as below:

 

array area area1-area10; 
array var var1-var10;

/* Call the array */

area(i)+dif(date)*mean(var(i),lag(var(i)));if first.id then area(i)=0;

 

 

--
Paige Miller
FreelanceReinh
Jade | Level 19

Hello @donanlb and welcome to the SAS Support Communities!

 

The problem is that dif(date)=0 starting from the second iteration of the loop because the date doesn't change. Assign dif(date) to a variable before the loop and then use that variable in the loop.

 

Example:

data want(drop=i d);
array var[15];
array AUC[15];
set have;
by id;
d=ifn(_n_>1,dif(date),0);
do i=1 to dim(var);
  AUC[i]+d*mean(var[i],lag(var[i]));
  if first.id then AUC[i]=0;
end;
*if last.id;
*drop var: date;
run;
donanlb
Fluorite | Level 6

Thanks a lot - that worked!

 

I additionally inserted an id count before the suggested variable - otherwise it would always have calculated the difference from the start of the sheet instead of from the start of each id. In the end I inserted another array to calculate the time-averaged mean by using the timelag variable that I had already calculated before for each id from the first to the individual observation. 

 

This will save me a lot of time during the next days 🙂

Here's the code:

data want ;set have;count+1;by id;
if first.id then count=1;
d=ifn(count>1,dif(date),0);
array _var(*) var1-var17;
array _area(*) area1-area17;
array _avevar(*) avevar1-avevar17;
do i=1 to dim(_var);
_area(i)+d*mean(_var(i),lag(_var(i)));if first.id then _area(i)=0;
_avevar(i)=_area(i)/timelag;
if _avevar(i)=. then _avevar(i)=_var(i);
end;
run;
FreelanceReinh
Jade | Level 19

You're welcome. Glad to see that my code worked for your data.

 

Note that the new variable COUNT is not needed. The only reason for my ifn(_n_>1, ...) construct was to avoid the unnecessary note "Missing values were generated ..." in the log. This note would occur during the AUC calculation due to the missing value returned by the DIF function when it is called for the first time, i.e., when _n_=1. (I should have explained this.) The results in dataset WANT would have been the same, had I just written 

d=dif(date);

The value of variable d in the first observation of an ID, be it missing or incorrect, does not affect the results anyway because 

if first.id then AUC[i]=0;

and the variable is dropped.

 

Your calculation of _avevar(i) may cause unnecessary notes in the log (such as "Division by zero detected ..." or "Mathematical operations could not be performed ..."). I think you could avoid them (without changing the results) by using conditional assignment statements:

if first.id then _avevar(i)=_var(i);
else _avevar(i)=_area(i)/timelag;
donanlb
Fluorite | Level 6
Thank you for clarifying!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1263 views
  • 3 likes
  • 3 in conversation