BookmarkSubscribeRSS Feed
_maldini_
Barite | Level 11

I have a dataset from a small RCT w/ 2 groups, 3 timepoints and > 80 continuous dependent variables. The analysis plan calls for two-sided independent samples t-tests for most of the dependent variables, w/ adjustment for multiple comparisons.* 

 

I plan to use PROC TTEST to compare the mean change from each time point to the others for EACH dependent variable across the 2 groups. 

 

I’m assuming I need to create new variables to represent the mean change between time points.

 

Example: Dependent Variable = Blood Pressure (BP) for Group A

 

BP_timepoint_1

BP_timepoint_2

BP_timepoint_3

 

Change_BP_timepoint_1_2 = BP_timepoint_1 - BP_timepoint_2

Change_BP_timepoint_1_3 = BP_timepoint_1 - BP_timepoint_3

Change_BP_timepoint_2_3 = BP_timepoint_2 - BP_timepoint_3

 

What is the best way - for a novice SAS user - to efficiently create a large number of new variables to represent the differences between time points for each of the dependent variables?

 

* I realize that running all these t-tests is inefficient, but that is what I have been asked to do. If you think I should use PROC GLM or PROC MIXED, instead, would I still need to create all these new variables with those approaches? I have no experience w/ either approach…

 

 

 

11 REPLIES 11
Reeza
Super User

Here's a tutorial on using Arrays in SAS
https://stats.idre.ucla.edu/sas/seminars/sas-arrays/

 

You could use an array to calculate the differences but if all your values are numeric it may also make sense to transpose your data so you can use BY group processing. 

 

Switch your data to a format such as:

 

Variable Time1 Time2 Time3;
BP 120 140 125
...
...

Then you can use BY in PROC TTEST to do all tests at once.

 

proc ttest data=long_form;
by variable;
paired time1*time2;
paired time2*time3;
run;

@_maldini_ wrote:

I have a dataset from a small RCT w/ 2 groups, 3 timepoints and > 80 continuous dependent variables. The analysis plan calls for two-sided independent samples t-tests for most of the dependent variables, w/ adjustment for multiple comparisons.* 

 

I plan to use PROC TTEST to compare the mean change from each time point to the others for EACH dependent variable across the 2 groups. 

 

I’m assuming I need to create new variables to represent the mean change between time points.

 

Example: Dependent Variable = Blood Pressure (BP) for Group A

 

BP_timepoint_1

BP_timepoint_2

BP_timepoint_3

 

Change_BP_timepoint_1_2 = BP_timepoint_1 - BP_timepoint_2

Change_BP_timepoint_1_3 = BP_timepoint_1 - BP_timepoint_3

Change_BP_timepoint_2_3 = BP_timepoint_2 - BP_timepoint_3

 

What is the best way - for a novice SAS user - to efficiently create a large number of new variables to represent the differences between time points for each of the dependent variables?

 

* I realize that running all these t-tests is inefficient, but that is what I have been asked to do. If you think I should use PROC GLM or PROC MIXED, instead, would I still need to create all these new variables with those approaches? I have no experience w/ either approach…

 

 

 


 

_maldini_
Barite | Level 11

@Reeza I'm not sure how to transpose my data into that form. 

 

It's currently in this form:

Screen Shot 2021-11-11 at 8.49.41 AM.png

When I try to transpose it, it ends up like this:

Screen Shot 2021-11-11 at 8.49.59 AM.png

This is my syntax:

proc sort data=meta.data_01;
by participant_id;
run;

proc transpose data=meta.data_01 out=meta.data_01_long prefix=value_;
   by participant_id;
run;
PROC PRINT DATA=meta.data_01_long;
RUN;

Also, is there a way to adjust for multiple comparisons using "BY in PROC TTEST"?

 

Thank you.

_maldini_
Barite | Level 11

I'm not entirely clear on this:

Variable Time1 Time2 Time3;
BP 120 140 125
...
...

The BY variable is "Group", so would the desired output look like this?

Participant_id         group     Variable          Time1     Time2     Time3;
         01             A           BP              120        140       125
         01             A           ApoA1           238.65     279.72   171.58
...
         02             B           BP              125        141       135
         02             B           ApoA1           268.65     288.72   181.58
...
Reeza
Super User

I think you may need more than one transpose. One to first get everything in one column and then parse the name to get the time point and then transpose it back out to a semi wide format for one for each time point. 

Tutorial are below for using either transpose or data step. You could do it in one data step but a bit more typing. 

 

Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/

And sometimes a double transpose is needed for extra wide data sets:
https://gist.github.com/statgeek/2321b6f62ab78d5bf2b0a5a8626bd7cd

_maldini_
Barite | Level 11

@Reeza Could you please help me get clear on the desired output (i.e., assuming I'm trying to use BY group processing and PROC TTEST)?

 

Am I trying to get to something that looks like this?

Screen Shot 2021-11-12 at 3.34.26 PM.png

Sample data:

 data WORK.DATA_04;
   infile datalines dsd truncover;
   input Participant_ID:$3. Group:$1. Sex:$2. Age:BEST12. Efflux_V1:BEST12. Efflux_V2:BEST12. Efflux_V3:BEST12. ApoA1_V1:BEST12. ApoA1_V2:BEST12. ApoA1_V3:BEST12. ApoC1_V1:BEST12. ApoC1_V2:BEST12. ApoC1_V3:BEST12.;
   format Age BEST12. Efflux_V1 BEST12. Efflux_V2 BEST12. Efflux_V3 BEST12. ApoA1_V1 BEST12. ApoA1_V2 BEST12. ApoA1_V3 BEST12. ApoC1_V1 BEST12. ApoC1_V2 BEST12. ApoC1_V3 BEST12.;
 datalines;
 1 A M 52 11.68 12.59 11.21 238.65 279.72 171.58 41.22 62.36 36.07
 10 B M 68 9.58 9.18 10.78 215.79 214.9 253.98 47.33 38 50.52
 11 A F 71 12.26 9.17 9.94 282.3 227.08 282.3 44.13 44.21 44.13
 12 B M 71 5.88 9.45 10.55 173.07 230.49 174.09 47.8 51.28 37.81
 13 A F 71 13.17 12.69 11.33 259.03 265.83 255.03 61.34 67.46 73.5
 14 B M 54 10.51 7.96 8.28 211.39 192.76 192.17 41.14 36.83 34.86
 15 A F 66 7.34 6.74 8.69 240.58 160.97 205.72 35.8 25.89 44.28
 16 B F 69 11.07 13.44 10.08 236.45 242.66 214.03 54.07 55.34 37.61
 17 A F 58 8.1 7.62 8.03 188.51 159.8 164.22 36.04 32.35 30.78
 18 B F 63 10.14 10.06 10.78 229.05 252.06 228.63 57.49 63.17 50.44
 ;;;;
Run;
Reeza
Super User

Did you test that input code? Did it work for you? It doesn't for me unfortunately. 

 

That being said, this should get you closer. 

If you get errors post your full log and code. 

proc transpose data=data_04 out=data_04a;
by participant_id group sex age;
run;

data data_04b;
set data_04a;
varName = scan(_name_, 1, "_");
TimePoint = scan(_name_, 2, "_");
*fake data;
col1 = 25;
run;

proc sort data=data_04b;
by participant_id group sex age varName timePoint;
run;

proc transpose data=data_04b out=data_05 prefix=timePoint;
by participant_id group sex age varName;
id timePoint;
var col1;
run;

proc sort data=data_05;
by varName group;
run;

proc ttest data=data_05;
by varName group;
paired timePointv1*timepointV2;
run;

proc ttest data=data_05;
by varName group;
paired timePointv2*timepointV3;
run;
_maldini_
Barite | Level 11

@Reeza Sorry, but I'm not clear on the DATA step below.

data data_04b;
set data_04a;
varName = scan(_name_, 1, "_");
TimePoint = scan(_name_, 2, "_");
*fake data;
col1 = 25;
run;

What am I putting in the place of "varName"? One of the dependent variables?

What am I putting in the place of "TimePoint"? 

 

Also, is this what you are referring to as the input code? If so, it does work for me.

data WORK.DATA_04;
   infile datalines dsd truncover;
   input Participant_ID:$3. Group:$1. Sex:$2. Age:BEST12. Efflux_V1:BEST12. Efflux_V2:BEST12. Efflux_V3:BEST12. ApoA1_V1:BEST12. ApoA1_V2:BEST12. ApoA1_V3:BEST12. ApoC1_V1:BEST12. ApoC1_V2:BEST12. ApoC1_V3:BEST12.;
   format Age BEST12. Efflux_V1 BEST12. Efflux_V2 BEST12. Efflux_V3 BEST12. ApoA1_V1 BEST12. ApoA1_V2 BEST12. ApoA1_V3 BEST12. ApoC1_V1 BEST12. ApoC1_V2 BEST12. ApoC1_V3 BEST12.;
 datalines;
 1 A M 52 11.68 12.59 11.21 238.65 279.72 171.58 41.22 62.36 36.07
 10 B M 68 9.58 9.18 10.78 215.79 214.9 253.98 47.33 38 50.52
 11 A F 71 12.26 9.17 9.94 282.3 227.08 282.3 44.13 44.21 44.13
 12 B M 71 5.88 9.45 10.55 173.07 230.49 174.09 47.8 51.28 37.81
 13 A F 71 13.17 12.69 11.33 259.03 265.83 255.03 61.34 67.46 73.5
 14 B M 54 10.51 7.96 8.28 211.39 192.76 192.17 41.14 36.83 34.86
 15 A F 66 7.34 6.74 8.69 240.58 160.97 205.72 35.8 25.89 44.28
 16 B F 69 11.07 13.44 10.08 236.45 242.66 214.03 54.07 55.34 37.61
 17 A F 58 8.1 7.62 8.03 188.51 159.8 164.22 36.04 32.35 30.78
 18 B F 63 10.14 10.06 10.78 229.05 252.06 228.63 57.49 63.17 50.44
 ;;;;
Run;

Finally, is this the desired format of the data set?

Screen Shot 2021-11-12 at 3.34.26 PM.png

 

Thank you.

ballardw
Super User

Some example data. Or at a minimum the output from Proc Contents so we have a usable description of your variables.

 

You may end up reshaping data as what I think you are describing can get extremely cumbersome to keep track of quite quickly.

 

You may also be looking at Proc Multtest, which does adjustments for multiple tests from a single data set.  It might be a good idea to read through the documentation for this procedure, at least the overview and getting started sections and the examples to see if things look similar to yours.

_maldini_
Barite | Level 11
Thank you. What is the best way to provide example data?
ballardw
Super User

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 

Or look at other posts on the forum where data step code is included.

_maldini_
Barite | Level 11

@ballardw I'm taking a shot at PROC MULTTEST. I'm wondering if you might be able to guide me a bit. I'm trying to compare means at 3 timepoints (V1, V2, V3) between groups (A, B). 

 

Here is a subset of my dataset:

 data WORK.DATA_04;
   infile datalines dsd truncover;
   input Participant_ID:$3. Group:$1. Sex:$2. Age:BEST12. Efflux_V1:BEST12. Efflux_V2:BEST12. Efflux_V3:BEST12. ApoA1_V1:BEST12. ApoA1_V2:BEST12. ApoA1_V3:BEST12. ApoC1_V1:BEST12. ApoC1_V2:BEST12. ApoC1_V3:BEST12.;
   format Age BEST12. Efflux_V1 BEST12. Efflux_V2 BEST12. Efflux_V3 BEST12. ApoA1_V1 BEST12. ApoA1_V2 BEST12. ApoA1_V3 BEST12. ApoC1_V1 BEST12. ApoC1_V2 BEST12. ApoC1_V3 BEST12.;
 datalines;
 1 A M 52 11.68 12.59 11.21 238.65 279.72 171.58 41.22 62.36 36.07
 10 B M 68 9.58 9.18 10.78 215.79 214.9 253.98 47.33 38 50.52
 11 A F 71 12.26 9.17 9.94 282.3 227.08 282.3 44.13 44.21 44.13
 12 B M 71 5.88 9.45 10.55 173.07 230.49 174.09 47.8 51.28 37.81
 13 A F 71 13.17 12.69 11.33 259.03 265.83 255.03 61.34 67.46 73.5
 14 B M 54 10.51 7.96 8.28 211.39 192.76 192.17 41.14 36.83 34.86
 15 A F 66 7.34 6.74 8.69 240.58 160.97 205.72 35.8 25.89 44.28
 16 B F 69 11.07 13.44 10.08 236.45 242.66 214.03 54.07 55.34 37.61
 17 A F 58 8.1 7.62 8.03 188.51 159.8 164.22 36.04 32.35 30.78
 18 B F 63 10.14 10.06 10.78 229.05 252.06 228.63 57.49 63.17 50.44
 ;;;;
Run;

Here is my initial syntax: 

ods graphics on;
PROC MULTTEST DATA=WORK.DATA_04 bootstrap nsample=20000 seed=41287 notables
              plots=PByTest(vref=0.05 0.1);
              
/* 	BY variables; */
/* 	Must sort by the BY variable first */
/* 	Not clear to me whether to use BY or CLASS */
	
	CLASS group;
	/* Group variable */
	
/* 	CONTRAST 'label' values; */
/* 	 */
/* 	FREQ variable; */
/* 	 */
/* 	ID variables; */
/* 	 */
/* 	STRATA variable; */
	
	TEST MEAN (Efflux_V1--ApoC1_V3);
	/* MEAN - Requests the t test for the mean */
run;
ods graphics off;

The log says, "ERROR: There is no input from the dataset." I guess I can't even get the DATA statement correct?!

 

Thanks for your help.

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 887 views
  • 7 likes
  • 3 in conversation