Hello! I have a dataset that is in long format. Each subject is supposed to have an outcome score at timepoints: 0 hours, 5 minutes, 15 minutes, and 30 minutes.
My data look like this:
data datafile;
input subject_number timepoint $ minutes score remove ;
datalines;
1 0h 0 10 0
1 5m 5 9 0
1 10m 10 10 0
1 15m 15 10 0
1 30m 30 11 0
2 0h 0 11 0
2 5m 5 . 0
2 15m 15 10 0
2 30m 30 . 1
;
run;
These data only consist of two subjects for simplicity sake.
Variables:
subject_number = subject numer
timepoint = timepoint each response variable was taken at
minutes = x-axis variable. This is the numeric version of timepoint
score = the y-axis/dependent variable
remove = this variable = 1 if either endpoint is missing (0h or 30 min)
My code would look something like this [with some of my own notes inserted as comments]:
/*Note: If subject is missing start time (0h) or end time (30m), then remove = 1*/
/*use the 'linear up, log down' method'*/
data datafile2;
set datafile;
/* linear formula:
auc = 1/2* (score_i +score_i+1) * (t_i+1 - t_i)*/
/*When I had calculated area using only the linear trapezoidal method, this is what code I used: */
lagtime = lag(minutes);
lagvalue = lag(score);
if minutes = 0 then do;
lagtime = 0;
lagvalue = 0;
end;
trapezoidScore = (minutes-lagtime)*(score + Lagvalue)/2;
SumTrapezoidScroe + TrapezoidScore;*/
/*log forumla: (score_i - score_i+1)/(ln(score_i)-ln(score_i+1))*(t_i+1-t_i)*/
run;
Basically, what I need to code is:
(a) If the score at timepoint i+1 is greater than or equal to the score at timepoint i, then use the linear trapezoidal method to calculate the area from timepoint i to timepoint i+1.
(b) If the score at timepoint i+1 is less than the score at timepoint i, then use the logarithmic trapezoidal method to calculate the area from timepoint i to timepoint i+1.
The linear trapezoidal method formula (what you'd use to calculate the area going 'up') is: where C1 and C2 are the y values (scores in our case), and t1 and t2 are the timepoints on the x-axis.
The logarithmic trapezoidal method formula (what you'd use to calculate the area going 'down' is:
I took these formulas and this idea of "linear-up log-down" from this short article .
If, say, the score at timepoint i+1 is missing, but is not missing at timepoint i and timepoint i+2, then calculate the area but using timepoint i and timepoint i+2's scores and timepoints (and depending on if the score at timepoint i >= or < the score at timepoint i+2, use either the linear trapezoidal method or logarithmic trapezoidal method. I've indicated such timepoints as you can see for subject 2 at 5 minutes.
Examples in the sample data:
Then, take all of the AUC's calculated above (and the timepoints that I didn't give an example for) and sum them per subject.
I need help on how to do this since my data are in long format. How do I tell SAS to skip to the next record, and look back at the previous record, depending on which is bigger, implement a different formula, and if the timepoint doesn't exist, then skip and go to the next one? Here, some thing to note: I don't always have a record such as subject 2, 30m, where the score is ".".. sometimes they just don't have a record at all, and sometimes it's there as "." (missing). For sake of this example, I just didn't include a 10m record for subject 2. So SAS would need to know to skip over the missing 10m..
Thank you so much!!!
Best,
Gina
A BY statement would create for you variables FIRST. and LAST.subject_number. You may use these to signal when subject number has changed or is about to change in the next, future record read.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.