Hello,
I'm attempting to adapt a macro to calculate 3 different AUCs by subject and treatment period.
Here is a link to the original macro:
https://www.lexjansen.com/wuss/2004/posters/c_post_the_sas_calculations_.pdf
My issue is that I have 36 subjects and 4 treatment periods, and I am unsure how to modify the code to reflect two important details:
1. apply the baseline measurement from the SQL step to the appropriate subject and treatment period.
2. calculate SumTrapezoid so that it will start over when calculating a new subject and treatment period, instead of summing the entire trapezoid row.
The macro has the option to calculate the data three different (and very useful) ways. I'm guessing a few key lines could make this incorporate my study design (I just can't figure it out!). Any advice is greatly appreciated!
-Patrick
Subject Period Xtime Yvalue
5901 P1 -15 110.5
5901 P1 0 113.6
5901 P1 30 195.5
5901 P1 60 234.9
5901 P1 90 235.2
5901 P1 120 200.1
5901 P1 150 134.5
5901 P1 180 110.2
5901 P2 -15 115.1
5901 P2 0 115.6
5901 P2 30 201.1
5901 P2 60 262.5
5901 P2 90 227.6
5901 P2 120 214.4
5901 P2 150 191.7
5901 P2 180 142.6
5901 P3 -15 117.2
5901 P3 0 115.9
5901 P3 30 191.8
5901 P3 60 265.5
5901 P3 90 261.6
5901 P3 120 192.4
5901 P3 150 159.7
5901 P3 180 127.0
5902 P1 -15 92.6
5902 P1 0 95.1
5902 P1 30 140.8
5902 P1 60 154.0
5902 P1 90 121.9
5902 P1 120 108.3
5902 P1 150 84.2
5902 P1 180 94.3
5902 P2 -15 100.5
5902 P2 0 99.6
5902 P2 30 143.4
5902 P2 60 151.0
5902 P2 90 137.5
5902 P2 120 128.2
5902 P2 150 95.7
5902 P2 180 92.8
5902 P3 -15 98.3
5902 P3 0 96.7
5902 P3 30 156.0
5902 P3 60 196.6
5902 P3 90 190.3
5902 P3 120 172.8
5902 P3 150 136.9
PROC SQL;
SELECT subject, period, MEAN(Yvalue) FORMAT=6.2 LABEL='Avg Yvalue' INTO : BaseY FROM
Have WHERE Xtime <= 0
group by subject, period;
QUIT;
%MACRO AUC(baseline, dataset, output);
DATA &output;
SET &dataset (WHERE=(Xtime GE 0));
RETAIN Basevalue;
IF &baseline = 0 THEN Basevalue = 0.0;
* &BaseY shown in the following statement is the macro variable defined in any one of the above SQL
procedures;
IF (&baseline = 1 OR &baseline = 2) AND _N_ = 1 THEN Basevalue =
&BaseY;
Yvalue = Yvalue - Basevalue;
DROP LagTime LagValue;
LagTime = LAG(Xtime);
LagValue = LAG(Yvalue);
IF Xtime = 0 THEN DO;
LagTime = 0;
LagValue = 0;
END;
IF &baseline = 2 AND Yvalue > 0 AND LagValue <= 0.0 THEN DO;
* Connecting line with positive slope, only the area of right triangle (above baseline) is counted.;
DROP Ratio;
Ratio = Yvalue / (ABS(LagValue)+Yvalue);
Trapezoid = Ratio*(Xtime-LagTime)*(Yvalue+0.00)/2;
END;
ELSE IF &baseline = 2 AND Yvalue < 0 AND LagValue >= 0.0 THEN DO;
* Connecting line with negative slope, only the area of left triangle (above baseline) is counted.;
DROP Ratio;
Ratio = LagValue / (LagValue+ABS(Yvalue));
Trapezoid = Ratio*(Xtime-LagTime)*(0.00+LagValue)/2;
END;
ELSE IF &baseline = 2 AND Yvalue < 0 AND LagValue < 0 THEN Trapezoid =
0.0;
* Negative trapezoidal area is not counted.;
ELSE Trapezoid = (Xtime-LagTime)*(Yvalue+LagValue)/2;
* The rest of all positive trapezoidal areas are counted.;
SumTrapezoid + Trapezoid;
FORMAT Trapezoid SumTrapezoid 8.3;
RUN;
%MEND AUC;
%AUC(2, have, want);
My guess is that modifying the code or the macro would be useless until you understand how it works. How can the original PROC SQL create just a single macro variable accurately, when it contains a GROUP BY clause? Just which observations are being used to calculate &BaseY?
If you can answer that, the solution probably involves scrapping macro language entirely. You would need sorted data, and a BY statement in the DATA step. For example, you might replace this condition:
if _n_=1 then ....
Instead it would become:
if first.subject then .....
But there is no moving forward until you understand what the code is doing now. That's the first step in understanding what needs to change.
1. Modify the Macro definition to include the parameters of Subject and Period based on your data
2. Move the SQL step into the macro to calculate BaseY in the macro.
3. Modify the data step to only include the data relevant to that subject and period - you also need to move the WHERE data set option outside.
%MACRO AUC(dataset=, baseline=, subject=, period=, output=);
proc sql noprint;
select mean(yValue) format=6.2 into :baseY
from &dataset
where subject="&subject." and period="&period.";
quit;
DATA &output;
SET &dataset;
where subject="&subject." and period="&period." and Xtime GE 0;
....
3. Create a list of the individual subject/periods combination. No idea where baseline is created here, that's up to you to sort out.
*create list of baseline, subject and period to call the macro;
PROC SQL;
create table macro_call as
SELECT distinct 2 as baseline, subject, period
from have;
QUIT;
4. Call the macro once for each line in the data set and create an output dataset of the form: OUTPUT_SUBJECTID_PERIOD
*execute the macro for each line;
data demo;
set macro_call;
str = catt('%AUC(dataset=Have', ', baseline=', baseline, ', subject = ', subject, ', period= ', period, ', output= OUT_', catx('_', subject, period), ');');
call execute(str);
run;
5. Combine your output datasets into one data set.
data combined_out;
set output_: ;
run;
As someone else mentioned you should really understand the code but this gets you around it for now. I have no idea if this code is correct to calculate the AUC.
I'm also assuming that you only need the last line for each data set not the full one, but also leaving that up to you.
@DrSolverson wrote:
Hello,
I'm attempting to adapt a macro to calculate 3 different AUCs by subject and treatment period.
Here is a link to the original macro:
https://www.lexjansen.com/wuss/2004/posters/c_post_the_sas_calculations_.pdf
My issue is that I have 36 subjects and 4 treatment periods, and I am unsure how to modify the code to reflect two important details:
1. apply the baseline measurement from the SQL step to the appropriate subject and treatment period.
2. calculate SumTrapezoid so that it will start over when calculating a new subject and treatment period, instead of summing the entire trapezoid row.
The macro has the option to calculate the data three different (and very useful) ways. I'm guessing a few key lines could make this incorporate my study design (I just can't figure it out!). Any advice is greatly appreciated!
-Patrick
Subject Period Xtime Yvalue 5901 P1 -15 110.5 5901 P1 0 113.6 5901 P1 30 195.5 5901 P1 60 234.9 5901 P1 90 235.2 5901 P1 120 200.1 5901 P1 150 134.5 5901 P1 180 110.2 5901 P2 -15 115.1 5901 P2 0 115.6 5901 P2 30 201.1 5901 P2 60 262.5 5901 P2 90 227.6 5901 P2 120 214.4 5901 P2 150 191.7 5901 P2 180 142.6 5901 P3 -15 117.2 5901 P3 0 115.9 5901 P3 30 191.8 5901 P3 60 265.5 5901 P3 90 261.6 5901 P3 120 192.4 5901 P3 150 159.7 5901 P3 180 127.0 5902 P1 -15 92.6 5902 P1 0 95.1 5902 P1 30 140.8 5902 P1 60 154.0 5902 P1 90 121.9 5902 P1 120 108.3 5902 P1 150 84.2 5902 P1 180 94.3 5902 P2 -15 100.5 5902 P2 0 99.6 5902 P2 30 143.4 5902 P2 60 151.0 5902 P2 90 137.5 5902 P2 120 128.2 5902 P2 150 95.7 5902 P2 180 92.8 5902 P3 -15 98.3 5902 P3 0 96.7 5902 P3 30 156.0 5902 P3 60 196.6 5902 P3 90 190.3 5902 P3 120 172.8 5902 P3 150 136.9
PROC SQL; SELECT subject, period, MEAN(Yvalue) FORMAT=6.2 LABEL='Avg Yvalue' INTO : BaseY FROM Have WHERE Xtime <= 0 group by subject, period; QUIT; %MACRO AUC(baseline, dataset, output); DATA &output; SET &dataset (WHERE=(Xtime GE 0)); RETAIN Basevalue; IF &baseline = 0 THEN Basevalue = 0.0; * &BaseY shown in the following statement is the macro variable defined in any one of the above SQL procedures; IF (&baseline = 1 OR &baseline = 2) AND _N_ = 1 THEN Basevalue = &BaseY; Yvalue = Yvalue - Basevalue; DROP LagTime LagValue; LagTime = LAG(Xtime); LagValue = LAG(Yvalue); IF Xtime = 0 THEN DO; LagTime = 0; LagValue = 0; END; IF &baseline = 2 AND Yvalue > 0 AND LagValue <= 0.0 THEN DO; * Connecting line with positive slope, only the area of right triangle (above baseline) is counted.; DROP Ratio; Ratio = Yvalue / (ABS(LagValue)+Yvalue); Trapezoid = Ratio*(Xtime-LagTime)*(Yvalue+0.00)/2; END; ELSE IF &baseline = 2 AND Yvalue < 0 AND LagValue >= 0.0 THEN DO; * Connecting line with negative slope, only the area of left triangle (above baseline) is counted.; DROP Ratio; Ratio = LagValue / (LagValue+ABS(Yvalue)); Trapezoid = Ratio*(Xtime-LagTime)*(0.00+LagValue)/2; END; ELSE IF &baseline = 2 AND Yvalue < 0 AND LagValue < 0 THEN Trapezoid = 0.0; * Negative trapezoidal area is not counted.; ELSE Trapezoid = (Xtime-LagTime)*(Yvalue+LagValue)/2; * The rest of all positive trapezoidal areas are counted.; SumTrapezoid + Trapezoid; FORMAT Trapezoid SumTrapezoid 8.3; RUN; %MEND AUC; %AUC(2, have, want);
@Rick_SAS wrote a blog about it a couple of years ago.
It's a really nice macro! Very useful for oral glucose tolerance test data sets - the three different ways it calculates AUC is attractive. The issue I'm having is that it was written for analysis of 1 subject's OGTT data, where I would like to apply it to several subjects (and their respective treatments). As the above users would tell you, I am really inexperienced (macros being a new area entirely), but this code will be useful if I can get the help I need. I've also consulted with a statistician on my research campus - I'll post their version when I hear back.
Hi Reeza,
Thank you for providing these directions. I am tinkering with it as we speak! Will follow-up.
Hi all,
Sorry for the delayed response, still waiting on the script the campus statisticians are building - will provide when I receive it.
In the meantime, here's the code based almost entirely off of the directions Reeza provided (thank you, Reeza!) along with some sample data at the end to show that their iteration works.
When working with the entire dataset, I have to restart SAS before switching between the three options to calculate AUC, otherwise baseline will not be calculated correctly - it will apply the first subject and period to all subjects and periods. I tried deleting the SQL table before switching between options and that doesn't appear to resolve the issue. Insignificant inconvenience for what is gained by the program.
%MACRO AUC(dataset, baseline, subject, treatment, output);
proc sql noprint;
select mean(yValue) format=6.2 into :baseY
from &dataset
where subject="&subject." and treatment="&treatment." and Xtime <= 0;
quit;
DATA &output;
SET &dataset;
where subject="&subject." and treatment="&treatment." and Xtime GE 0;
RETAIN Basevalue;
IF &baseline = 0 THEN Basevalue = 0.0;
* &BaseY shown in the following statement is the macro variable defined in any one of the above SQL
procedures;
IF (&baseline = 1 OR &baseline = 2) AND _N_ = 1 THEN Basevalue =
&BaseY;
if Xtime=0 then Yvalue = &BaseY;
Yvalue = Yvalue - Basevalue;
DROP LagTime LagValue;
LagTime = LAG(Xtime);
LagValue = LAG(Yvalue);
IF Xtime = 0 THEN DO;
LagTime = 0;
LagValue = 0;
END;
IF &baseline = 2 AND Yvalue > 0 AND LagValue <= 0.0 THEN DO;
* Connecting line with positive slope, only the area of right triangle (above baseline) is counted.;
DROP Ratio;
Ratio = Yvalue / (ABS(LagValue)+Yvalue);
Trapezoid = Ratio*(Xtime-LagTime)*(Yvalue+0.00)/2;
END;
ELSE IF &baseline = 2 AND Yvalue < 0 AND LagValue >= 0.0 THEN DO;
* Connecting line with negative slope, only the area of left triangle (above baseline) is counted.;
DROP Ratio;
Ratio = LagValue / (LagValue+ABS(Yvalue));
Trapezoid = Ratio*(Xtime-LagTime)*(0.00+LagValue)/2;
END;
ELSE IF &baseline = 2 AND Yvalue < 0 AND LagValue < 0 THEN Trapezoid =
0.0;
* Negative trapezoidal area is not counted.;
ELSE Trapezoid = (Xtime-LagTime)*(Yvalue+LagValue)/2;
* The rest of all positive trapezoidal areas are counted.;
SumTrapezoid + Trapezoid;
FORMAT Trapezoid SumTrapezoid 8.3;
RUN;
%MEND AUC;
PROC SQL;
create table macro_call as
SELECT distinct 2 as baseline, subject, treatment
from gluc2;
QUIT;
*execute the macro for each line; data demo; set macro_call; str =
catt('%AUC(dataset=gluc2', ', baseline=', baseline, ', subject = ', subject, ', treatment =
', treatment, ', output= OUT_', catx('_', subject, treatment), ');'); call execute(str); run;
data combined_out_baseline2;
set out_: ;
run;
proc sort data = combined_out_baseline2;
by subject period;
run;
proc print data = combined_out_baseline2;
run;
/*code below is for if you just want to try running the macro on one subject and period instead of the whole dataset*/
/*(dataset, baseline, subject, treatment, output)*/
%AUC(gluc2,1,5919,CX, practice);
proc print data = practice;
run;
/*heres the whole dataset*/
data gluc;
input Subject $ period $ xtime yvalue;
datalines;
5901 1 -15 110.5
5901 1 0 113.6
5901 1 30 205
5901 1 60 240
5901 1 90 230
5901 1 120 195
5901 1 150 125
5901 1 180 105.2
5901 2 -15 110.1
5901 2 0 110.6
5901 2 30 201.1
5901 2 60 270.5
5901 2 90 220.6
5901 2 120 205.4
5901 2 150 180.7
5901 2 180 130.6
5901 3 -15 115.2
5901 3 0 115.9
5901 3 30 188.8
5901 3 60 260.5
5901 3 90 250.6
5901 3 120 180.4
5901 3 150 145.7
5901 3 180 115.0
;
run;
/* Add treatment variable to glucose dataset based on period info */
data gluc2;
set gluc;
length treatment $8;
if subject = '5901' and period = '1' then treatment = 'CX';
if subject = '5901' and period = '2' then treatment = 'BY';
if subject = '5901' and period = '3' then treatment = 'AZ';
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.