BookmarkSubscribeRSS Feed
ginak
Quartz | Level 8

Hello! I have a dataset that is in long format. Each subject is supposed to have an outcome score at timepoints: 0 hours, 5 minutes, 15 minutes, and 30 minutes.

 

My data look like this:

 


data  datafile;
	input subject_number timepoint $ minutes score remove ;
	datalines;
	1 0h 0 10 0 
	1 5m 5 9 0 
	1 10m 10 10 0 
	1 15m 15 10 0 
	1 30m 30 11 0 
	2 0h 0 11 0 
	2 5m 5 . 0 
	2 15m 15 10 0 
	2 30m 30 . 1 
	;
run;


These data only consist of two subjects for simplicity sake. 

 

Variables:

subject_number = subject numer

timepoint = timepoint each response variable was taken at

minutes = x-axis variable. This is the numeric version of timepoint 

score = the y-axis/dependent variable

remove = this variable = 1 if either endpoint is missing (0h or 30 min)

 

My code would look something like this [with some of my own notes inserted as comments]:


/*Note: If subject is missing start time (0h) or end time (30m), then remove = 1*/
/*use the 'linear up, log down' method'*/

data datafile2;
	set datafile;
 
/*	linear formula:
	auc = 1/2* (score_i +score_i+1) * (t_i+1 - t_i)*/
	
/*When I had calculated area using only the linear trapezoidal method, this is what code I used: */
	lagtime = lag(minutes);
	lagvalue = lag(score);
	if minutes = 0 then do;
		lagtime = 0;
		lagvalue = 0;
	end;
	trapezoidScore = (minutes-lagtime)*(score + Lagvalue)/2;
	 SumTrapezoidScroe + TrapezoidScore;*/
	
	/*log forumla: (score_i  - score_i+1)/(ln(score_i)-ln(score_i+1))*(t_i+1-t_i)*/
run;

 

Basically, what I need to code is:

(a) If the score at timepoint i+1 is greater than or equal to the score at timepoint i, then use the linear trapezoidal method to calculate the area from timepoint i to timepoint i+1.

(b) If the score at timepoint i+1 is less than the score at timepoint i, then use the logarithmic trapezoidal method to calculate the area from timepoint i to timepoint i+1.

 

The linear trapezoidal method formula (what you'd use to calculate the area going 'up') is:    linear trapezoidal method.png where C1 and C2 are the y values (scores in our case), and t1 and t2 are the timepoints on the x-axis.

 

The logarithmic trapezoidal method formula (what you'd use to calculate the area going 'down' is: logarithmic trapezoidal method.png

 

I took these formulas and this idea of "linear-up log-down" from this short article .

If, say, the score at timepoint i+1 is missing, but is not missing at timepoint i and timepoint i+2, then calculate the area but using timepoint i and timepoint i+2's scores and timepoints (and depending on if the score at timepoint i >= or < the score at timepoint i+2, use either the linear trapezoidal method or logarithmic trapezoidal method. I've indicated such timepoints as you can see for subject 2 at 5 minutes. 

 

Examples in the sample data: 

  • Subject 1 from 0h to 5m, his score goes down from 10 to 9. Here, we'd use the logarithmic trapezoidal method formula to calculate the area under the curve for this section:
    • AUC_0h_5m =  [(10 - 9)/(ln(10) - ln(9))]*(5-0)
  • Subject 1 from 5m to 10m, his score goes up from 9 to 10. Here, we'd use the linear trapezoidal method formula to calculate the area under the curve for this section:
    • AUC_5m_10m = 0.5(9 + 10) * (10 - 5)
  • Subject 1 from 10m to 15m, his score remains the same. Use the linear trapezoidal method formula to calculate the area under the curve for this section:
    • AUC_10m_15m = 0.5(10+10) * (15-10)
  • Subject 2 has a score at 0h and 10m, but is missing score at 5m. Also, his score went down from timepoint 0h to 10m. Use the logarithmic trapezoidal method to calculate the area under the curve for this section:
    • AUC_0h_15m = [(11-10)/(ln(11)-ln(10))]*(15-0)
  • Subject 2 is missing a score at the last timepoint, which is 30m. We will ignore this part.

Then, take all of the AUC's calculated above (and the timepoints that I didn't give an example for) and sum them per subject.

 

I need help on how to do this since my data are in long format. How do I tell SAS to skip to the next record, and look back at the previous record, depending on which is bigger, implement a different formula, and if the timepoint doesn't exist, then skip and go to the next one? Here, some thing to note: I don't always have a record such as subject 2, 30m, where the score is ".".. sometimes they just don't have a record at all, and sometimes it's there as "." (missing). For sake of this example, I just didn't include a 10m record for subject 2. So SAS would need to know to skip over the missing 10m.. 

 

Thank you so much!!!

Best,

Gina

1 REPLY 1
PhilC
Rhodochrosite | Level 12

A BY statement would create for you variables FIRST. and LAST.subject_number.  You may use these to signal when subject number has changed or is about to change in the next, future record read.

 

SAS Help Center: BY Statement

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 668 views
  • 1 like
  • 2 in conversation