Hello! I have a dataset that is in long format. Each subject is supposed to have an outcome score at timepoints: 0 hours, 5 minutes, 15 minutes, and 30 minutes.
My data look like this:
data datafile;
input subject_number timepoint $ minutes score remove ;
datalines;
1 0h 0 10 0
1 5m 5 9 0
1 10m 10 10 0
1 15m 15 10 0
1 30m 30 11 0
2 0h 0 11 0
2 5m 5 . 0
2 15m 15 10 0
2 30m 30 . 1
;
run;
These data only consist of two subjects for simplicity sake.
Variables:
subject_number = subject numer
timepoint = timepoint each response variable was taken at
minutes = x-axis variable. This is the numeric version of timepoint
score = the y-axis/dependent variable
remove = this variable = 1 if either endpoint is missing (0h or 30 min)
My code would look something like this [with some of my own notes inserted as comments]:
/*Note: If subject is missing start time (0h) or end time (30m), then remove = 1*/
/*use the 'linear up, log down' method'*/
data datafile2;
set datafile;
/* linear formula:
auc = 1/2* (score_i +score_i+1) * (t_i+1 - t_i)*/
/*When I had calculated area using only the linear trapezoidal method, this is what code I used: */
lagtime = lag(minutes);
lagvalue = lag(score);
if minutes = 0 then do;
lagtime = 0;
lagvalue = 0;
end;
trapezoidScore = (minutes-lagtime)*(score + Lagvalue)/2;
SumTrapezoidScroe + TrapezoidScore;*/
/*log forumla: (score_i - score_i+1)/(ln(score_i)-ln(score_i+1))*(t_i+1-t_i)*/
run;
Basically, what I need to code is:
(a) If the score at timepoint i+1 is greater than or equal to the score at timepoint i, then use the linear trapezoidal method to calculate the area from timepoint i to timepoint i+1.
(b) If the score at timepoint i+1 is less than the score at timepoint i, then use the logarithmic trapezoidal method to calculate the area from timepoint i to timepoint i+1.
The linear trapezoidal method formula (what you'd use to calculate the area going 'up') is: where C1 and C2 are the y values (scores in our case), and t1 and t2 are the timepoints on the x-axis.
The logarithmic trapezoidal method formula (what you'd use to calculate the area going 'down' is:
I took these formulas and this idea of "linear-up log-down" from this short article .
If, say, the score at timepoint i+1 is missing, but is not missing at timepoint i and timepoint i+2, then calculate the area but using timepoint i and timepoint i+2's scores and timepoints (and depending on if the score at timepoint i >= or < the score at timepoint i+2, use either the linear trapezoidal method or logarithmic trapezoidal method. I've indicated such timepoints as you can see for subject 2 at 5 minutes.
Examples in the sample data:
Subject 1 from 0h to 5m, his score goes down from 10 to 9. Here, we'd use the logarithmic trapezoidal method formula to calculate the area under the curve for this section:
AUC_0h_5m = [(10 - 9)/(ln(10) - ln(9))]*(5-0)
Subject 1 from 5m to 10m, his score goes up from 9 to 10. Here, we'd use the linear trapezoidal method formula to calculate the area under the curve for this section:
AUC_5m_10m = 0.5(9 + 10) * (10 - 5)
Subject 1 from 10m to 15m, his score remains the same. Use the linear trapezoidal method formula to calculate the area under the curve for this section:
AUC_10m_15m = 0.5(10+10) * (15-10)
Subject 2 has a score at 0h and 10m, but is missing score at 5m. Also, his score went down from timepoint 0h to 10m. Use the logarithmic trapezoidal method to calculate the area under the curve for this section:
AUC_0h_15m = [(11-10)/(ln(11)-ln(10))]*(15-0)
Subject 2 is missing a score at the last timepoint, which is 30m. We will ignore this part.
Then, take all of the AUC's calculated above (and the timepoints that I didn't give an example for) and sum them per subject.
I need help on how to do this since my data are in long format. How do I tell SAS to skip to the next record, and look back at the previous record, depending on which is bigger, implement a different formula, and if the timepoint doesn't exist, then skip and go to the next one? Here, some thing to note: I don't always have a record such as subject 2, 30m, where the score is ".".. sometimes they just don't have a record at all, and sometimes it's there as "." (missing). For sake of this example, I just didn't include a 10m record for subject 2. So SAS would need to know to skip over the missing 10m..
Thank you so much!!!
Best,
Gina
... View more