I have a dataset from a small RCT w/ 2 groups, 3 timepoints and > 80 continuous dependent variables. The analysis plan calls for two-sided independent samples t-tests for most of the dependent variables, w/ adjustment for multiple comparisons.*
I plan to use PROC TTEST to compare the mean change from each time point to the others for EACH dependent variable across the 2 groups.
I’m assuming I need to create new variables to represent the mean change between time points.
Example: Dependent Variable = Blood Pressure (BP) for Group A
BP_timepoint_1
BP_timepoint_2
BP_timepoint_3
Change_BP_timepoint_1_2 = BP_timepoint_1 - BP_timepoint_2
Change_BP_timepoint_1_3 = BP_timepoint_1 - BP_timepoint_3
Change_BP_timepoint_2_3 = BP_timepoint_2 - BP_timepoint_3
What is the best way - for a novice SAS user - to efficiently create a large number of new variables to represent the differences between time points for each of the dependent variables?
* I realize that running all these t-tests is inefficient, but that is what I have been asked to do. If you think I should use PROC GLM or PROC MIXED, instead, would I still need to create all these new variables with those approaches? I have no experience w/ either approach…
Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/
You could use an array to calculate the differences but if all your values are numeric it may also make sense to transpose your data so you can use BY group processing.
Switch your data to a format such as:
Variable Time1 Time2 Time3; BP 120 140 125 ... ...
Then you can use BY in PROC TTEST to do all tests at once.
proc ttest data=long_form;
by variable;
paired time1*time2;
paired time2*time3;
run;
@_maldini_ wrote:
I have a dataset from a small RCT w/ 2 groups, 3 timepoints and > 80 continuous dependent variables. The analysis plan calls for two-sided independent samples t-tests for most of the dependent variables, w/ adjustment for multiple comparisons.*
I plan to use PROC TTEST to compare the mean change from each time point to the others for EACH dependent variable across the 2 groups.
I’m assuming I need to create new variables to represent the mean change between time points.
Example: Dependent Variable = Blood Pressure (BP) for Group A
BP_timepoint_1
BP_timepoint_2
BP_timepoint_3
Change_BP_timepoint_1_2 = BP_timepoint_1 - BP_timepoint_2
Change_BP_timepoint_1_3 = BP_timepoint_1 - BP_timepoint_3
Change_BP_timepoint_2_3 = BP_timepoint_2 - BP_timepoint_3
What is the best way - for a novice SAS user - to efficiently create a large number of new variables to represent the differences between time points for each of the dependent variables?
* I realize that running all these t-tests is inefficient, but that is what I have been asked to do. If you think I should use PROC GLM or PROC MIXED, instead, would I still need to create all these new variables with those approaches? I have no experience w/ either approach…
@Reeza I'm not sure how to transpose my data into that form.
It's currently in this form:
When I try to transpose it, it ends up like this:
This is my syntax:
proc sort data=meta.data_01;
by participant_id;
run;
proc transpose data=meta.data_01 out=meta.data_01_long prefix=value_;
by participant_id;
run;
PROC PRINT DATA=meta.data_01_long;
RUN;
Also, is there a way to adjust for multiple comparisons using "BY in PROC TTEST"?
Thank you.
I'm not entirely clear on this:
Variable Time1 Time2 Time3; BP 120 140 125 ... ...
The BY variable is "Group", so would the desired output look like this?
Participant_id group Variable Time1 Time2 Time3; 01 A BP 120 140 125 01 A ApoA1 238.65 279.72 171.58 ... 02 B BP 125 141 135 02 B ApoA1 268.65 288.72 181.58 ...
I think you may need more than one transpose. One to first get everything in one column and then parse the name to get the time point and then transpose it back out to a semi wide format for one for each time point.
Tutorial are below for using either transpose or data step. You could do it in one data step but a bit more typing.
Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/
And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd
@Reeza Could you please help me get clear on the desired output (i.e., assuming I'm trying to use BY group processing and PROC TTEST)?
Am I trying to get to something that looks like this?
Sample data:
data WORK.DATA_04; infile datalines dsd truncover; input Participant_ID:$3. Group:$1. Sex:$2. Age:BEST12. Efflux_V1:BEST12. Efflux_V2:BEST12. Efflux_V3:BEST12. ApoA1_V1:BEST12. ApoA1_V2:BEST12. ApoA1_V3:BEST12. ApoC1_V1:BEST12. ApoC1_V2:BEST12. ApoC1_V3:BEST12.; format Age BEST12. Efflux_V1 BEST12. Efflux_V2 BEST12. Efflux_V3 BEST12. ApoA1_V1 BEST12. ApoA1_V2 BEST12. ApoA1_V3 BEST12. ApoC1_V1 BEST12. ApoC1_V2 BEST12. ApoC1_V3 BEST12.; datalines; 1 A M 52 11.68 12.59 11.21 238.65 279.72 171.58 41.22 62.36 36.07 10 B M 68 9.58 9.18 10.78 215.79 214.9 253.98 47.33 38 50.52 11 A F 71 12.26 9.17 9.94 282.3 227.08 282.3 44.13 44.21 44.13 12 B M 71 5.88 9.45 10.55 173.07 230.49 174.09 47.8 51.28 37.81 13 A F 71 13.17 12.69 11.33 259.03 265.83 255.03 61.34 67.46 73.5 14 B M 54 10.51 7.96 8.28 211.39 192.76 192.17 41.14 36.83 34.86 15 A F 66 7.34 6.74 8.69 240.58 160.97 205.72 35.8 25.89 44.28 16 B F 69 11.07 13.44 10.08 236.45 242.66 214.03 54.07 55.34 37.61 17 A F 58 8.1 7.62 8.03 188.51 159.8 164.22 36.04 32.35 30.78 18 B F 63 10.14 10.06 10.78 229.05 252.06 228.63 57.49 63.17 50.44 ;;;; Run;
Did you test that input code? Did it work for you? It doesn't for me unfortunately.
That being said, this should get you closer.
If you get errors post your full log and code.
proc transpose data=data_04 out=data_04a;
by participant_id group sex age;
run;
data data_04b;
set data_04a;
varName = scan(_name_, 1, "_");
TimePoint = scan(_name_, 2, "_");
*fake data;
col1 = 25;
run;
proc sort data=data_04b;
by participant_id group sex age varName timePoint;
run;
proc transpose data=data_04b out=data_05 prefix=timePoint;
by participant_id group sex age varName;
id timePoint;
var col1;
run;
proc sort data=data_05;
by varName group;
run;
proc ttest data=data_05;
by varName group;
paired timePointv1*timepointV2;
run;
proc ttest data=data_05;
by varName group;
paired timePointv2*timepointV3;
run;
@Reeza Sorry, but I'm not clear on the DATA step below.
data data_04b; set data_04a; varName = scan(_name_, 1, "_"); TimePoint = scan(_name_, 2, "_"); *fake data; col1 = 25; run;
What am I putting in the place of "varName"? One of the dependent variables?
What am I putting in the place of "TimePoint"?
Also, is this what you are referring to as the input code? If so, it does work for me.
data WORK.DATA_04;
infile datalines dsd truncover;
input Participant_ID:$3. Group:$1. Sex:$2. Age:BEST12. Efflux_V1:BEST12. Efflux_V2:BEST12. Efflux_V3:BEST12. ApoA1_V1:BEST12. ApoA1_V2:BEST12. ApoA1_V3:BEST12. ApoC1_V1:BEST12. ApoC1_V2:BEST12. ApoC1_V3:BEST12.;
format Age BEST12. Efflux_V1 BEST12. Efflux_V2 BEST12. Efflux_V3 BEST12. ApoA1_V1 BEST12. ApoA1_V2 BEST12. ApoA1_V3 BEST12. ApoC1_V1 BEST12. ApoC1_V2 BEST12. ApoC1_V3 BEST12.;
datalines;
1 A M 52 11.68 12.59 11.21 238.65 279.72 171.58 41.22 62.36 36.07
10 B M 68 9.58 9.18 10.78 215.79 214.9 253.98 47.33 38 50.52
11 A F 71 12.26 9.17 9.94 282.3 227.08 282.3 44.13 44.21 44.13
12 B M 71 5.88 9.45 10.55 173.07 230.49 174.09 47.8 51.28 37.81
13 A F 71 13.17 12.69 11.33 259.03 265.83 255.03 61.34 67.46 73.5
14 B M 54 10.51 7.96 8.28 211.39 192.76 192.17 41.14 36.83 34.86
15 A F 66 7.34 6.74 8.69 240.58 160.97 205.72 35.8 25.89 44.28
16 B F 69 11.07 13.44 10.08 236.45 242.66 214.03 54.07 55.34 37.61
17 A F 58 8.1 7.62 8.03 188.51 159.8 164.22 36.04 32.35 30.78
18 B F 63 10.14 10.06 10.78 229.05 252.06 228.63 57.49 63.17 50.44
;;;;
Run;
Finally, is this the desired format of the data set?
Thank you.
Some example data. Or at a minimum the output from Proc Contents so we have a usable description of your variables.
You may end up reshaping data as what I think you are describing can get extremely cumbersome to keep track of quite quickly.
You may also be looking at Proc Multtest, which does adjustments for multiple tests from a single data set. It might be a good idea to read through the documentation for this procedure, at least the overview and getting started sections and the examples to see if things look similar to yours.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
Or look at other posts on the forum where data step code is included.
@ballardw I'm taking a shot at PROC MULTTEST. I'm wondering if you might be able to guide me a bit. I'm trying to compare means at 3 timepoints (V1, V2, V3) between groups (A, B).
Here is a subset of my dataset:
data WORK.DATA_04; infile datalines dsd truncover; input Participant_ID:$3. Group:$1. Sex:$2. Age:BEST12. Efflux_V1:BEST12. Efflux_V2:BEST12. Efflux_V3:BEST12. ApoA1_V1:BEST12. ApoA1_V2:BEST12. ApoA1_V3:BEST12. ApoC1_V1:BEST12. ApoC1_V2:BEST12. ApoC1_V3:BEST12.; format Age BEST12. Efflux_V1 BEST12. Efflux_V2 BEST12. Efflux_V3 BEST12. ApoA1_V1 BEST12. ApoA1_V2 BEST12. ApoA1_V3 BEST12. ApoC1_V1 BEST12. ApoC1_V2 BEST12. ApoC1_V3 BEST12.; datalines; 1 A M 52 11.68 12.59 11.21 238.65 279.72 171.58 41.22 62.36 36.07 10 B M 68 9.58 9.18 10.78 215.79 214.9 253.98 47.33 38 50.52 11 A F 71 12.26 9.17 9.94 282.3 227.08 282.3 44.13 44.21 44.13 12 B M 71 5.88 9.45 10.55 173.07 230.49 174.09 47.8 51.28 37.81 13 A F 71 13.17 12.69 11.33 259.03 265.83 255.03 61.34 67.46 73.5 14 B M 54 10.51 7.96 8.28 211.39 192.76 192.17 41.14 36.83 34.86 15 A F 66 7.34 6.74 8.69 240.58 160.97 205.72 35.8 25.89 44.28 16 B F 69 11.07 13.44 10.08 236.45 242.66 214.03 54.07 55.34 37.61 17 A F 58 8.1 7.62 8.03 188.51 159.8 164.22 36.04 32.35 30.78 18 B F 63 10.14 10.06 10.78 229.05 252.06 228.63 57.49 63.17 50.44 ;;;; Run;
Here is my initial syntax:
ods graphics on; PROC MULTTEST DATA=WORK.DATA_04 bootstrap nsample=20000 seed=41287 notables plots=PByTest(vref=0.05 0.1); /* BY variables; */ /* Must sort by the BY variable first */ /* Not clear to me whether to use BY or CLASS */ CLASS group; /* Group variable */ /* CONTRAST 'label' values; */ /* */ /* FREQ variable; */ /* */ /* ID variables; */ /* */ /* STRATA variable; */ TEST MEAN (Efflux_V1--ApoC1_V3); /* MEAN - Requests the t test for the mean */ run; ods graphics off;
The log says, "ERROR: There is no input from the dataset." I guess I can't even get the DATA statement correct?!
Thanks for your help.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.