Hello! I have a dataset that is in long format. Each subject is supposed to have an outcome score at timepoints: 0 hours, 5 minutes, 15 minutes, and 30 minutes.
My data look like this:
data datafile;
input subject_number timepoint $ minutes score remove ;
datalines;
1 0h 0 10 0
1 5m 5 9 0
1 10m 10 10 0
1 15m 15 10 0
1 30m 30 11 0
2 0h 0 11 0
2 5m 5 . 0
2 15m 15 10 0
2 30m 30 . 1
;
run;
These data only consist of two subjects for simplicity sake.
Variables:
subject_number = subject numer
timepoint = timepoint each response variable was taken at
minutes = x-axis variable. This is the numeric version of timepoint
score = the y-axis/dependent variable
remove = this variable = 1 if either endpoint is missing (0h or 30 min)
My code would look something like this [with some of my own notes inserted as comments]:
/*Note: If subject is missing start time (0h) or end time (30m), then remove = 1*/
/*use the 'linear up, log down' method'*/
data datafile2;
set datafile;
/* linear formula:
auc = 1/2* (score_i +score_i+1) * (t_i+1 - t_i)*/
/*When I had calculated area using only the linear trapezoidal method, this is what code I used: */
lagtime = lag(minutes);
lagvalue = lag(score);
if minutes = 0 then do;
lagtime = 0;
lagvalue = 0;
end;
trapezoidScore = (minutes-lagtime)*(score + Lagvalue)/2;
SumTrapezoidScroe + TrapezoidScore;*/
/*log forumla: (score_i - score_i+1)/(ln(score_i)-ln(score_i+1))*(t_i+1-t_i)*/
run;
Basically, what I need to code is:
(a) If the score at timepoint i+1 is greater than or equal to the score at timepoint i, then use the linear trapezoidal method to calculate the area from timepoint i to timepoint i+1.
(b) If the score at timepoint i+1 is less than the score at timepoint i, then use the logarithmic trapezoidal method to calculate the area from timepoint i to timepoint i+1.
The linear trapezoidal method formula (what you'd use to calculate the area going 'up') is: where C1 and C2 are the y values (scores in our case), and t1 and t2 are the timepoints on the x-axis.
The logarithmic trapezoidal method formula (what you'd use to calculate the area going 'down' is:
I took these formulas and this idea of "linear-up log-down" from this short article .
If, say, the score at timepoint i+1 is missing, but is not missing at timepoint i and timepoint i+2, then calculate the area but using timepoint i and timepoint i+2's scores and timepoints (and depending on if the score at timepoint i >= or < the score at timepoint i+2, use either the linear trapezoidal method or logarithmic trapezoidal method. I've indicated such timepoints as you can see for subject 2 at 5 minutes.
Examples in the sample data:
Then, take all of the AUC's calculated above (and the timepoints that I didn't give an example for) and sum them per subject.
I need help on how to do this since my data are in long format. How do I tell SAS to skip to the next record, and look back at the previous record, depending on which is bigger, implement a different formula, and if the timepoint doesn't exist, then skip and go to the next one? Here, some thing to note: I don't always have a record such as subject 2, 30m, where the score is ".".. sometimes they just don't have a record at all, and sometimes it's there as "." (missing). For sake of this example, I just didn't include a 10m record for subject 2. So SAS would need to know to skip over the missing 10m..
Thank you so much!!!
Best,
Gina
A BY statement would create for you variables FIRST. and LAST.subject_number. You may use these to signal when subject number has changed or is about to change in the next, future record read.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.