SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
DrSolverson
Calcite | Level 5

Hello,

 

I'm attempting to adapt a macro to calculate 3 different AUCs by subject and treatment period.

 

Here is a link to the original macro:

https://www.lexjansen.com/wuss/2004/posters/c_post_the_sas_calculations_.pdf

 

My issue is that I have 36 subjects and 4 treatment periods, and I am unsure how to modify the code to reflect two important details: 

1. apply the baseline measurement from the SQL step to the appropriate subject and treatment period.

2. calculate SumTrapezoid so that it will start over when calculating a new subject and treatment period, instead of summing the entire trapezoid row.

 

The macro has the option to calculate the data three different (and very useful) ways. I'm guessing a few key lines could make this incorporate my study design (I just can't figure it out!). Any advice is greatly appreciated! 

 

-Patrick

 

Subject	Period	Xtime	Yvalue
5901	P1	-15	110.5
5901	P1	0	113.6
5901	P1	30	195.5
5901	P1	60	234.9
5901	P1	90	235.2
5901	P1	120	200.1
5901	P1	150	134.5
5901	P1	180	110.2
5901	P2	-15	115.1
5901	P2	0	115.6
5901	P2	30	201.1
5901	P2	60	262.5
5901	P2	90	227.6
5901	P2	120	214.4
5901	P2	150	191.7
5901	P2	180	142.6
5901	P3	-15	117.2
5901	P3	0	115.9
5901	P3	30	191.8
5901	P3	60	265.5
5901	P3	90	261.6
5901	P3	120	192.4
5901	P3	150	159.7
5901	P3	180	127.0
5902	P1	-15	92.6
5902	P1	0	95.1
5902	P1	30	140.8
5902	P1	60	154.0
5902	P1	90	121.9
5902	P1	120	108.3
5902	P1	150	84.2
5902	P1	180	94.3
5902	P2	-15	100.5
5902	P2	0	99.6
5902	P2	30	143.4
5902	P2	60	151.0
5902	P2	90	137.5
5902	P2	120	128.2
5902	P2	150	95.7
5902	P2	180	92.8
5902	P3	-15	98.3
5902	P3	0	96.7
5902	P3	30	156.0
5902	P3	60	196.6
5902	P3	90	190.3
5902	P3	120	172.8
5902	P3	150	136.9

 

PROC SQL;
SELECT subject, period, MEAN(Yvalue) FORMAT=6.2 LABEL='Avg Yvalue' INTO : BaseY FROM
Have  WHERE Xtime <= 0
  group by subject, period;
QUIT; 

%MACRO AUC(baseline, dataset, output); 
DATA &output;
SET &dataset (WHERE=(Xtime GE 0));
RETAIN Basevalue;
IF &baseline = 0 THEN Basevalue = 0.0;
* &BaseY shown in the following statement is the macro variable defined in any one of the above SQL
procedures;
IF (&baseline = 1 OR &baseline = 2) AND _N_ = 1 THEN Basevalue =
&BaseY;
Yvalue = Yvalue - Basevalue;
DROP LagTime LagValue;
LagTime = LAG(Xtime);
LagValue = LAG(Yvalue);
IF Xtime = 0 THEN DO;
 LagTime = 0;
 LagValue = 0;
END; 
IF &baseline = 2 AND Yvalue > 0 AND LagValue <= 0.0 THEN DO;
* Connecting line with positive slope, only the area of right triangle (above baseline) is counted.;
 DROP Ratio;
 Ratio = Yvalue / (ABS(LagValue)+Yvalue);
 Trapezoid = Ratio*(Xtime-LagTime)*(Yvalue+0.00)/2;
 END;
ELSE IF &baseline = 2 AND Yvalue < 0 AND LagValue >= 0.0 THEN DO;
* Connecting line with negative slope, only the area of left triangle (above baseline) is counted.;
 DROP Ratio;
 Ratio = LagValue / (LagValue+ABS(Yvalue));
 Trapezoid = Ratio*(Xtime-LagTime)*(0.00+LagValue)/2;
 END;
ELSE IF &baseline = 2 AND Yvalue < 0 AND LagValue < 0 THEN Trapezoid =
0.0;
* Negative trapezoidal area is not counted.;
ELSE Trapezoid = (Xtime-LagTime)*(Yvalue+LagValue)/2;
* The rest of all positive trapezoidal areas are counted.;
SumTrapezoid + Trapezoid;
FORMAT Trapezoid SumTrapezoid 8.3;
RUN;
%MEND AUC;

%AUC(2, have, want);

 

9 REPLIES 9
Reeza
Super User
Or call the macro once for each subject/treatment/baseline period. You can use CALL EXECUTE to call it multiple times and combine the results afterwards. Use a naming convention and then you can stack all the results easily at the end.
Astounding
PROC Star

My guess is that modifying the code or the macro would be useless until you understand how it works.  How can the original PROC SQL create just a single macro variable accurately, when it contains a GROUP BY clause?  Just which observations are being used to calculate &BaseY?

 

If you can answer that, the solution probably involves scrapping macro language entirely.  You would need sorted data, and a BY statement in the DATA step.  For example, you might replace this condition:

 

if _n_=1 then ....

 

Instead it would become:

 

if first.subject then .....

 

But there is no moving forward until you understand what the code is doing now.  That's the first step in understanding what needs to change.

Reeza
Super User
The macro has two macro variables, &baseline and &baseY, where does &baseline come from ?
Reeza
Super User

1. Modify the Macro definition to include the parameters of Subject and Period based on your data

2. Move the SQL step into the macro to calculate BaseY in the macro. 

3. Modify the data step to only include the data relevant to that subject and period - you also need to move the WHERE data set option outside. 

 

%MACRO AUC(dataset=, baseline=, subject=, period=, output=); 


proc sql noprint;
select mean(yValue) format=6.2 into :baseY
from &dataset
where subject="&subject." and period="&period.";
quit;

DATA &output;
SET &dataset;
where subject="&subject." and period="&period." and Xtime GE 0;

....

3. Create a list of the individual subject/periods combination. No idea where baseline is created here, that's up to you to sort out. 

*create list of baseline, subject and period to call the macro;
PROC SQL;
create table macro_call as
SELECT distinct 2 as baseline, subject, period
from have;
QUIT; 

4. Call the macro once for each line in the data set and create an output dataset of the form: OUTPUT_SUBJECTID_PERIOD

*execute the macro for each line;
data demo;
set macro_call;

str = catt('%AUC(dataset=Have', ', baseline=', baseline, ', subject = ', subject, ', period= ', period, ', output= OUT_', catx('_', subject, period), ');');
call execute(str);
run;

5. Combine your output datasets into one data set. 

 

data combined_out;
set output_: ;
run;

 

As someone else mentioned you should really understand the code but this gets you around it for now. I have no idea if this code is correct to calculate the AUC.

I'm also assuming that you only need the last line for each data set not the full one, but also leaving that up to you. 

 


@DrSolverson wrote:

Hello,

 

I'm attempting to adapt a macro to calculate 3 different AUCs by subject and treatment period.

 

Here is a link to the original macro:

https://www.lexjansen.com/wuss/2004/posters/c_post_the_sas_calculations_.pdf

 

My issue is that I have 36 subjects and 4 treatment periods, and I am unsure how to modify the code to reflect two important details: 

1. apply the baseline measurement from the SQL step to the appropriate subject and treatment period.

2. calculate SumTrapezoid so that it will start over when calculating a new subject and treatment period, instead of summing the entire trapezoid row.

 

The macro has the option to calculate the data three different (and very useful) ways. I'm guessing a few key lines could make this incorporate my study design (I just can't figure it out!). Any advice is greatly appreciated! 

 

-Patrick

 

Subject	Period	Xtime	Yvalue
5901	P1	-15	110.5
5901	P1	0	113.6
5901	P1	30	195.5
5901	P1	60	234.9
5901	P1	90	235.2
5901	P1	120	200.1
5901	P1	150	134.5
5901	P1	180	110.2
5901	P2	-15	115.1
5901	P2	0	115.6
5901	P2	30	201.1
5901	P2	60	262.5
5901	P2	90	227.6
5901	P2	120	214.4
5901	P2	150	191.7
5901	P2	180	142.6
5901	P3	-15	117.2
5901	P3	0	115.9
5901	P3	30	191.8
5901	P3	60	265.5
5901	P3	90	261.6
5901	P3	120	192.4
5901	P3	150	159.7
5901	P3	180	127.0
5902	P1	-15	92.6
5902	P1	0	95.1
5902	P1	30	140.8
5902	P1	60	154.0
5902	P1	90	121.9
5902	P1	120	108.3
5902	P1	150	84.2
5902	P1	180	94.3
5902	P2	-15	100.5
5902	P2	0	99.6
5902	P2	30	143.4
5902	P2	60	151.0
5902	P2	90	137.5
5902	P2	120	128.2
5902	P2	150	95.7
5902	P2	180	92.8
5902	P3	-15	98.3
5902	P3	0	96.7
5902	P3	30	156.0
5902	P3	60	196.6
5902	P3	90	190.3
5902	P3	120	172.8
5902	P3	150	136.9

 

PROC SQL;
SELECT subject, period, MEAN(Yvalue) FORMAT=6.2 LABEL='Avg Yvalue' INTO : BaseY FROM
Have  WHERE Xtime <= 0
  group by subject, period;
QUIT; 

%MACRO AUC(baseline, dataset, output); 
DATA &output;
SET &dataset (WHERE=(Xtime GE 0));
RETAIN Basevalue;
IF &baseline = 0 THEN Basevalue = 0.0;
* &BaseY shown in the following statement is the macro variable defined in any one of the above SQL
procedures;
IF (&baseline = 1 OR &baseline = 2) AND _N_ = 1 THEN Basevalue =
&BaseY;
Yvalue = Yvalue - Basevalue;
DROP LagTime LagValue;
LagTime = LAG(Xtime);
LagValue = LAG(Yvalue);
IF Xtime = 0 THEN DO;
 LagTime = 0;
 LagValue = 0;
END; 
IF &baseline = 2 AND Yvalue > 0 AND LagValue <= 0.0 THEN DO;
* Connecting line with positive slope, only the area of right triangle (above baseline) is counted.;
 DROP Ratio;
 Ratio = Yvalue / (ABS(LagValue)+Yvalue);
 Trapezoid = Ratio*(Xtime-LagTime)*(Yvalue+0.00)/2;
 END;
ELSE IF &baseline = 2 AND Yvalue < 0 AND LagValue >= 0.0 THEN DO;
* Connecting line with negative slope, only the area of left triangle (above baseline) is counted.;
 DROP Ratio;
 Ratio = LagValue / (LagValue+ABS(Yvalue));
 Trapezoid = Ratio*(Xtime-LagTime)*(0.00+LagValue)/2;
 END;
ELSE IF &baseline = 2 AND Yvalue < 0 AND LagValue < 0 THEN Trapezoid =
0.0;
* Negative trapezoidal area is not counted.;
ELSE Trapezoid = (Xtime-LagTime)*(Yvalue+LagValue)/2;
* The rest of all positive trapezoidal areas are counted.;
SumTrapezoid + Trapezoid;
FORMAT Trapezoid SumTrapezoid 8.3;
RUN;
%MEND AUC;

%AUC(2, have, want);

 


 

Ksharp
Super User

@Rick_SAS wrote a blog about it a couple of years ago.

DrSolverson
Calcite | Level 5

It's a really nice macro! Very useful for oral glucose tolerance test data sets - the three different ways it calculates AUC is attractive. The issue I'm having is that it was written for analysis of 1 subject's OGTT data, where I would like to apply it to several subjects (and their respective treatments). As the above users would tell you, I am really inexperienced (macros being a new area entirely), but this code will be useful if I can get the help I need. I've also consulted with a statistician on my research campus - I'll post their version when I hear back. 

Reeza
Super User
Did you try the solution I posted. You should be able to replace your data set name and have it run for all and it will generate the output data set for you....
DrSolverson
Calcite | Level 5

Hi Reeza,

 

Thank you for providing these directions. I am tinkering with it as we speak! Will follow-up.

DrSolverson
Calcite | Level 5

Hi all,

Sorry for the delayed response, still waiting on the script the campus statisticians are building - will provide when I receive it.

In the meantime, here's the code based almost entirely off of the directions Reeza provided (thank you, Reeza!) along with some sample data at the end to show that their iteration works. 

 

When working with the entire dataset, I have to restart SAS before switching between the three options to calculate AUC, otherwise baseline will not be calculated correctly - it will apply the first subject and period to all subjects and periods. I tried deleting the SQL table before switching  between options and that doesn't appear to resolve the issue. Insignificant inconvenience for what is gained by the program.

 

%MACRO AUC(dataset, baseline, subject, treatment, output); 

proc sql noprint;
select mean(yValue) format=6.2 into :baseY
from &dataset
where subject="&subject." and treatment="&treatment." and Xtime <= 0;
quit;

DATA &output;
SET &dataset;
where subject="&subject." and treatment="&treatment." and Xtime GE 0;
RETAIN Basevalue;
IF &baseline = 0 THEN Basevalue = 0.0;
* &BaseY shown in the following statement is the macro variable defined in any one of the above SQL
procedures;
IF (&baseline = 1 OR &baseline = 2) AND _N_ = 1 THEN Basevalue =
&BaseY;
if Xtime=0 then Yvalue = &BaseY;
Yvalue = Yvalue - Basevalue;
DROP LagTime LagValue;
LagTime = LAG(Xtime);
LagValue = LAG(Yvalue);
IF Xtime = 0 THEN DO;
 LagTime = 0;
 LagValue = 0;
END; 
IF &baseline = 2 AND Yvalue > 0 AND LagValue <= 0.0 THEN DO;
* Connecting line with positive slope, only the area of right triangle (above baseline) is counted.;
 DROP Ratio;
 Ratio = Yvalue / (ABS(LagValue)+Yvalue);
 Trapezoid = Ratio*(Xtime-LagTime)*(Yvalue+0.00)/2;
 END;
ELSE IF &baseline = 2 AND Yvalue < 0 AND LagValue >= 0.0 THEN DO;
* Connecting line with negative slope, only the area of left triangle (above baseline) is counted.;
 DROP Ratio;
 Ratio = LagValue / (LagValue+ABS(Yvalue));
 Trapezoid = Ratio*(Xtime-LagTime)*(0.00+LagValue)/2;
 END;
ELSE IF &baseline = 2 AND Yvalue < 0 AND LagValue < 0 THEN Trapezoid =
0.0;
* Negative trapezoidal area is not counted.;
ELSE Trapezoid = (Xtime-LagTime)*(Yvalue+LagValue)/2;
* The rest of all positive trapezoidal areas are counted.;
SumTrapezoid + Trapezoid;
FORMAT Trapezoid SumTrapezoid 8.3;
RUN;
%MEND AUC;

PROC SQL;
create table macro_call as
SELECT distinct 2 as baseline, subject, treatment
from gluc2;
QUIT; 
*execute the macro for each line; data demo; set macro_call; str =
catt('%AUC(dataset=gluc2', ', baseline=', baseline, ', subject = ', subject, ', treatment = 
', treatment, ', output= OUT_', catx('_', subject, treatment), ');'); call execute(str); run;
data combined_out_baseline2;
set out_: ;
run;

proc sort data = combined_out_baseline2;
by subject period;
run;

proc print data = combined_out_baseline2;
run;

/*code below is for if you just want to try running the macro on one subject and period instead of the whole dataset*/
/*(dataset, baseline, subject, treatment, output)*/
%AUC(gluc2,1,5919,CX, practice);
proc print data = practice;
run;

/*heres the whole dataset*/
data gluc;
input Subject $ period $ xtime yvalue; 
datalines;
5901	1	-15	110.5
5901	1	0	113.6
5901	1	30	205
5901	1	60	240
5901	1	90	230
5901	1	120	195
5901	1	150	125
5901	1	180	105.2
5901	2	-15	110.1
5901	2	0	110.6
5901	2	30	201.1
5901	2	60	270.5
5901	2	90	220.6
5901	2	120	205.4
5901	2	150	180.7
5901	2	180	130.6
5901	3	-15	115.2
5901	3	0	115.9
5901	3	30	188.8
5901	3	60	260.5
5901	3	90	250.6
5901	3	120	180.4
5901	3	150	145.7
5901	3	180	115.0
;
run;

/* Add treatment variable to glucose dataset based on period info */
data gluc2;
set gluc;
length treatment $8;
if	subject	=	'5901'   and period = '1' then	treatment	=   'CX';
if	subject	=	'5901'   and period = '2' then	treatment	=   'BY';
if	subject	=	'5901'   and period = '3' then	treatment	=   'AZ';
run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2510 views
  • 1 like
  • 4 in conversation