Completely new to this so very thankful for your help.
In a crossover study with Treatments X, Y, and Z (order is randomly assigned), under each treatment there are repetitive measurements at 3 different time points (Time 1, 2, and 3). I need to find the difference of time 2 and 1 under treatments X and Y and then the difference between those two values. So something to the effect of (Y2-Y1) - (X2-X1). I used wide data format and my codes work but they are highly repetitive because X/Y occur randomly in each treatment period:
if treatment_1 = "X" then difference_X = treatment1_time2 - treatment1_time1;
if treatment_2 = "X" then difference_X = treatment2_time2 - treatment2_time1;
if treatment_3 = "X" then difference_X = treatment3_time2 - treatment3_time1;
if treatment_1 = "Y" then difference_Y = treatment1_time2 - treatment1_time1;
if treatment_2 = "Y" then difference_Y = treatment2_time2 - treatment2_time1;
if treatment_3 = "Y" then difference_Y = treatment3_time2 - treatment3_time1;
and then I subtracted difference_X from difference_Y. I tried using array and do loops but since treatment X can occur in treatment periods 1, 2, or 3, I couldn't make it work with fewer lines of codes. Would something like this be more efficiently accomplished using the long format? Or is there a data formatting/cleaning function that I need to learn?
Can you provide an example of what your data set looks like as data step code.?
I am a tad concerned that you are assigning values to the same variable based on multiple variables and if any of your observations have more than one "set" of values that your example code may be incomplete as you maybe should have multiple difference results.
Hi ballardw, thank you so much for your response. The data set looks like this
data try;
input treatment_1 $ treatment_2 $ treatment_3 $ treatment_1_time1 treatment_1_time2 treatment_1_time3 treatment_2_time1 treatment_2_time2 treatment_2_time3 treatment_3_time1 treatment_3_time2 treatment_3_time3;
datalines;
X Y Z 3.4 4.2 5.3 8.9 7.4 3.5 2.7 2.8 9.2
Y Z X 8.5 7.2 6.6 4.0 1.3 6.4 6.3 2.5 5.1
Z X Y 2.1 7.7 5.7 5.8 7.0 4.2 1.0 1.2 7.0
;
The values under treatment_# represent the first, second, and third period of a crossover study during which X/Y/Z could be administered. The values under treatment_#_time# represent the outcome at that time of measurement. I apologize for how confusing my naming convention is. My goal here is to find the difference between treatment_1_time1 and treatment_1_time2 only when X and Y are administered. The ultimate goal is to evaluate the effects of treatment order on outcome like in a traditional AB/BA crossover study. But unlike a simple 2x2 study there is the addition of a third treatment and multiple measurements during each treatment period.
Sounds like you mean you have this data:
data have;
input subject leg treatment $ time result ;
datalines;
1 1 X 1 3.4
1 1 X 2 4.2
1 1 X 3 5.3
1 2 Y 1 8.9
1 2 Y 2 7.4
1 2 Y 3 3.5
1 3 Z 1 2.7
1 3 Z 2 2.8
1 3 Z 3 9.2
2 1 Y 1 8.5
2 1 Y 2 7.2
2 1 Y 3 6.6
2 2 Z 1 4
2 2 Z 2 1.3
2 2 Z 3 6.4
2 3 X 1 6.3
2 3 X 2 2.5
2 3 X 3 5.1
3 1 Z 1 2.1
3 1 Z 2 7.7
3 1 Z 3 5.7
3 2 X 1 5.8
3 2 X 2 7
3 2 X 3 4.2
3 3 Y 1 1
3 3 Y 2 1.2
3 3 Y 3 7
;
Hi Tom, yes this really clarifies what I was trying to convey, thank you! I do have a version of my data that looks like this, however, I'm still at a loss as to how I would output a column that subtracts time 1 results from time 2 only when treatments X and Y are administered and then find the differences between those two in the long format. Down the line when I conduct data analysis comparing this "result difference" between the XY/YX groups, wouldn't the wide format that has one "result difference" per subject be easier to work with?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.