## Area under the curve calculation

Occasional Contributor
Posts: 5

# Area under the curve calculation

Hello I have a set of data, I need to calculate the Area Under the Curve, how can I achieve that in SAS?

 ID Time Concentration 2 0 1.120540964 2 4 0.581928655 2 8 0.532928263 2 12 0.366326931 2 16 0.481379851 2 20 0.614268914 2 24 0.692669541 2 28 0.340258722 2 32 0.447275578 2 36 0.394551156 2 40 0.611916895 2 44 0.611916895 2 48 0.476871815 3 0 1.466453674 3 4 0.768142401 3 8 0.464399817 3 12 0.500912825 3 16 0.751939754 3 20 0.829758101 3 24 0.81195801 3 28 0.523505249 3 32 0.508443633 3 36 0.356230032 3 40 0.306481059 3 44 0.452304884 3 48 0.329073482 4 0 0.569163694 4 4 0.423900119 4 8 0.372968688 4 12 0.503963535 4 16 0.447483155 4 20 0.760602457 4 24 0.752080856 4 28 1.273285771 4 32 0.326397146 4 36 0.448474039 4 40 0.478596908 4 44 0.714229092 4 48 0.563416568 5 0 0.829721362 5 4 1.272832817 5 8 0.621904025 5 12 0.367840557 5 16 0.47871517 5 20 1.062113003 5 24 0.796633127 5 28 0.395123839 5 32 0.367260062 5 36 0.935178019 5 40 1.938660991 5 44 0.949690402 5 48 1.617066563

Thank you

Super User
Posts: 13,942

## Re: Area under the curve calculation

What role does ID play in this?

When you say "under the curve" do you mean under a smoothed curve going through those points or the sums of areas of the trapezoids defined by successive time values?

Occasional Contributor
Posts: 5

## Re: Area under the curve calculation

ID is the name is different subjects in my study. I prefer to have the area of the trapezoids

Thank you

PROC Star
Posts: 1,405

## Re: Area under the curve calculation

Do you want this done for each id or as a whole?

Occasional Contributor
Posts: 5

## Re: Area under the curve calculation

For each ID, thank you

Super User
Posts: 13,942

## Re: Area under the curve calculation

[ Edited ]

One way:

```data want;
set have;
by id;
retain area;
lt = lat(time);
lc = lag(concentration);
if first.id then area=0;
else area = area + (time - lt)*mean(concentration, lc);
run;```

Area here is the cumulative area and the last value of Id has to total.

You didn't show what the output should look like. If you only want final total then add : If last.id; to the end of the code.

Super User
Posts: 10,850

## Re: Area under the curve calculation

@Rick_SAS has written a blog about it .

Posts: 1,399

## Re: Area under the curve calculation

Since you specify "area under CURVE", I presume you want one observation per ID:

``````data have;
input ID Time Concentration ;
datalines;
2  0 1.120540964
2  4 0.581928655
2  8 0.532928263
2 12 0.366326931
2 16 0.481379851
2 20 0.614268914
2 24 0.692669541
2 28 0.340258722
2 32 0.447275578
2 36 0.394551156
2 40 0.611916895
2 44 0.611916895
2 48 0.476871815
3  0 1.466453674
3  4 0.768142401
3  8 0.464399817
3 12 0.500912825
3 16 0.751939754
3 20 0.829758101
3 24 0.81195801
3 28 0.523505249
3 32 0.508443633
3 36 0.356230032
3 40 0.306481059
3 44 0.452304884
3 48 0.329073482
4  0 0.569163694
4  4 0.423900119
4  8 0.372968688
4 12 0.503963535
4 16 0.447483155
4 20 0.760602457
4 24 0.752080856
4 28 1.273285771
4 32 0.326397146
4 36 0.448474039
4 40 0.478596908
4 44 0.714229092
4 48 0.563416568
5  0 0.829721362
5  4 1.272832817
5  8 0.621904025
5 12 0.367840557
5 16 0.47871517
5 20 1.062113003
5 24 0.796633127
5 28 0.395123839
5 32 0.367260062
5 36 0.935178019
5 40 1.938660991
5 44 0.949690402
5 48 1.617066563
run;

data want (keep=id area);
set have;
by id;
area + dif(time)*mean(concentration,lag(concentration));
if first.id then area=0;
if last.id;
run;``````

The DIF function is equivalent to X-lag(X).

Occasional Contributor
Posts: 5

## Re: Area under the curve calculation

Thank you very much for your response, but the code doesn't work. this is the message that comes out. I'm really new in SAS, the only reason I use is because the pressure in my department, apparently if you don't use SAS you are not considerate an animal scientist

376  ;
377
378  data want (keep=id area);
379    set have;
380    by id;
381    area + dif(time)*mean(concentration,lag(concentration));
382    if first.id then area=0;
383    if last.id;
384  run;

NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line)Column).
1 at 381:19
NOTE: There were 52 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 4 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time           0.06 seconds
cpu time            0.04 seconds

Posts: 1,399

## Re: Area under the curve calculation

1. Did you look at the data? If so, what's wrong with it?
2. What about the log makes you think something is wrong?  Here's what I get from the  log:
1. 52 observations were read in (i.e. 13 observations each for 4 id's).
2. 4 observations were written out - one per id, yielding the total area for each id.
3. There is a note that a missing value was generated ONCE in line 381.  So the other 51 times line 381 was executed, a  missing value was NOT generated.

What was that one time?   Since the DIF function is defined as dif(x)=x-lag(x), then the missing value was generated for the first observation, since  lag(x) would be missing at the point.

However, while a missing value is generated for the DIF(x) expression in observation 1, it's not propagated to area, because the statement is effectively
area + .       (a summing statement)
rather than area=area+.   (an assignment statement)

The summing statement for obs 1 is
0+.
which yields a 0 in observation 1.

So If there is a  problem, you need to be more specific about what it is.

Super User
Posts: 8,220

## Re: Area under the curve calculation

Mark,

I agree that your program works. But I question whether it is correctly calculating the area under the curve.

I know better than to challenge you in a math competition but, years ago, I wrote programs in basic and fortran that calculated a Gini. Out of curiosity, I took the basic code and, without trying to optimize it, converted it so that it would run in SAS. I compared the results with some examples I found on the web (e.g., https://www3.nd.edu/~wbrooks/GiniNotes.pdf ).

My results matched their results, but didn't match the results obtained with your code.

Am I missing something?

Here is what I ran:

``````data have;
input id time concentration;
cards;
1 0.02 0.10
1 0.03 0.05
1 0.10 0.15
1 0.50 0.50
1 0.35 0.20
;

data want;
set have (rename=(concentration=x time=y));
if missing(y) then y=1;
p=x/(x+y);
run;

proc sort data=want;
by id descending p;
run;

data want;
do until (last.id);
set want;
by id;
if first.id then do;
sumx=x;
sumy=y;
end;
else do;
sumx+x;
sumy+y;
end;
end;
do until (last.id);
set want;
by id;
output;
end;
run;

data want (keep=g s);
set want;
by id;
if first.id then do;
g=0;
cumx=0;
cumy=0;
end;
xpct=x/sumx;
ypct=y/sumy;
G+(((CUMY*2)+YPCT)*XPCT);
CUMX+XPCT;
CUMY+YPCT;
S+ABS(XPCT-YPCT);
if last.id then do;
G=(1-G)*100;
S=S*.5*100;
output;
end;
run;
``````

Art, CEO, AnalystFinder.com

Posts: 1,399

## Re: Area under the curve calculation

Art:

I don't think the OP here is asking for Gini  (area between a 45 degree line and a Lorenz curve connecting cumulative totals of ordered scores of the criterion variable).  So I wouldn't expect your program to produce the same value as my program above.

My code above is supposed to sum 12 trapezoids per ID, arising from 13 observations in sequence, where the trapezoid height is the diference in successive times, and the two bases are the current and previous concentration scores.

If there's a request for an overall measure of inequality in the distribution of the criterion, I totally missed it.

regards,

Mark

Super User
Posts: 8,220

## Re: Area under the curve calculation

@ Mark: I totally agree! Would you agree with the following?:

@Aridany: The more I look at your data and question, I think that @mkeintz's approach will do what you want. I had been thinking about it like a Gini problem (i.e., a measure of inequality), but I don't think that is what it is.

You want to know the area under the curve, but you haven't indicated what is being plotted.

The following is based on plotting percent of total concentration as a function of time (thus the total area of each plot would be 48). Without some kind of standardization like that, I'm not sure if the results for each id are comparable with each other:

``````data need (drop=total);
do until (last.id);
set have;
by id;
total+concentration;
end;
do until (last.id);
set have;
by id;
pct=concentration/total;
output;
end;
run;

data want (keep=id area_below area_above);
set need;
by id;
area_below + dif(time)*mean(pct,lag(pct));
area_above + dif(time)*mean(1-pct,1-lag(pct));
if first.id then do;
area_below=0;
area_above=0;
end;
if last.id;
run;
``````

Art, CEO, AnalystFinder.com

Occasional Contributor
Posts: 5

## Re: Area under the curve calculation

Hello,

Thanks everybody for the answers, it looks like you really know what you are saying. The biggest problem I have is I don't understand what you mean by "want", "need" and "have" code. I'm not a mathematician, I just need the area under the curve, I don't understand all that you are saying, so I'm going to use Excel to calculate that. I prefer to be a weirdo in my department for not using SAS than kill myself trying to understand what you all are saying. Thanks again all for your effort. Really surprised me how much you know. Impressive. I prefer to keep using statistic packages more user frendly like Minitab or SPSS.

Have a great day,

Ary

Posts: 1,399

## Re: Area under the curve calculation

@Aridany

WANT, NEED, HAVE.  This is not a sas-specific terminology. nor is it mathematical.

As a problem generalization, the user typically has a dataset, which we generically call HAVE.  And the user commonly desires to produce results in the form of another datasets, which we generically call WANT.   And sometimes you must create an intermediate data set to solve the problem in two steps instead of one.  We generically call that NEED.

If you had created your own dataset name when you posed the problem, we may very well have used those names instead.

You didn't specify how you wanted the result (as a printable file, or a data set), which I think is understandable.  However, you apparently didn't look at the resulting dataset that my code produced.  And only now do I think that you wanted the area-under-curve printed out in the log or listing as opposed to being stored in a data set.

User-friendliness, even in SAS, can benefit from clarifying the problem statement when respondents generate something other than what you want.  Mutual patience can be rewarding.

Discussion stats
• 14 replies
• 159 views
• 2 likes
• 6 in conversation