Solved: PROC TTEST not reading variables

geneticsum · Posted 04-17-2019 11:58 AM

I'm trying to run a matched pairs t-test (SAS 9.3) on some gene expression data, matching the gene expression values for unaffected subjects to expression for affected subjects. The two affection states make the two variables unaffected_gene_x and affected_gene_x. There are at least 2500 genes so I've constructed a macro. Problem is, results come up with n=0.

Here is the code: (I've set total=10 for the macro so I could check if it worked before doing all the genes)

%macro runtime(total);
%do I=1 %to &total.;

proc ttest data=pairs;
	paired unaffected_gene_&i.*affected_gene_&i.;
run;

%end;

%mend;
%runtime(10);

I have double checked the pairs dataset to ensure it has all the appropriate data. Each variable has 3 observations and 3 missing values (n=6 total individuals). For example, the variable unaffected_gene_1 has 3 values for the unaffected individuals, and 3 missing values for the affected individuals. Log does not indicate any errors (ie. finding the variables).

ballardw · Posted 04-17-2019 12:21 PM

Show some example data for one of the pairs, 10 or 15 records should provide sufficient information.

Any record that does not have values for both variables on the PAIRED statement is excluded. Which sounds like the issue with your narrative about "missing".

Paired is intended sort of for before/after (or some other similar contexts).

If the data is from different GROUPS of records then you should have a classification variable that takes values of Affected and Unaffected, that would go on a CLASS statement, and a single variable holding the numeric value, such as Gene_1 on a VAR statement.

You may want to look into reshaping your data and running without a macro. This might work for 10 genes. If so then change the array definition to 2500 or the number of actual gene variables you have.

data need;
   set pairs;
   array u unaffected_gene_1-unaffected_gene_10;
   array a affected_gene_1  -  affected_gene_10;
   do gene=1 to dim(u);
      status='Unaffected';value=u[gene];output;
      status='Affected';value=a[gene];output;
   end;
   keep gene status value;
run;
proc sort data=need;
   by gene;
run;

proc ttest data=need;
   by gene;
   class status ;
   var value;
run;

View solution in original post

ballardw · Posted 04-17-2019 12:21 PM

Show some example data for one of the pairs, 10 or 15 records should provide sufficient information.

Any record that does not have values for both variables on the PAIRED statement is excluded. Which sounds like the issue with your narrative about "missing".

Paired is intended sort of for before/after (or some other similar contexts).

If the data is from different GROUPS of records then you should have a classification variable that takes values of Affected and Unaffected, that would go on a CLASS statement, and a single variable holding the numeric value, such as Gene_1 on a VAR statement.

You may want to look into reshaping your data and running without a macro. This might work for 10 genes. If so then change the array definition to 2500 or the number of actual gene variables you have.

data need;
   set pairs;
   array u unaffected_gene_1-unaffected_gene_10;
   array a affected_gene_1  -  affected_gene_10;
   do gene=1 to dim(u);
      status='Unaffected';value=u[gene];output;
      status='Affected';value=a[gene];output;
   end;
   keep gene status value;
run;
proc sort data=need;
   by gene;
run;

proc ttest data=need;
   by gene;
   class status ;
   var value;
run;

geneticsum · Posted 04-17-2019 12:31 PM

I agree, that sounds like the problem here - each row only has a value for either the affected state or the unaffected state, not both. The reason the dataset is constructed this way is because each row is an individual subject.

Thank you very much! I have tried the code and it works beautifully. If you have a little more time - do you know if there is a way to filter results so I can narrow it down to any genes where difference is significant?

ballardw · Posted 04-17-2019 01:50 PM

@geneticsum wrote:

I agree, that sounds like the problem here - each row only has a value for either the affected state or the unaffected state, not both. The reason the dataset is constructed this way is because each row is an individual subject.

Thank you very much! I have tried the code and it works beautifully. If you have a little more time - do you know if there is a way to filter results so I can narrow it down to any genes where difference is significant?

I sort of thought something like this question would be the next bit.

I would create one or more data sets from the output. You can create data sets of the various test or summary statistics using ODS OUTPUT. The syntax to add to the Proc TTEST code would look something like:

Ods output ttests = yourlib.ttestsdatasetname;

TTests following the output above is the name of an ODS table generated by the procedure, there are others that hold the other bits requested that can be found the documentation for the procedure under details>ODS Table Names.

The dataset should have both the t-statistic in a variable named tValue and the Pr.|t| in Probt. There will be two records per test that use two different methods for estimating variance, Pooled and Satterhwaite, in the variable Method , plus the by variable(s) and the VAR variable name(s). You would filter the desired information based on the method and the tValue or ProbT range of interest.

If you want to reduce the amount of output sent to the results, such as only wanting the ttest results you can also use an ODS SELECT , or EXCLUDE, to only create desired output table(s) immediately prior to the Proc Ttest statement.

geneticsum · Posted 04-17-2019 01:53 PM

Wonderful, thank you!

PROC TTEST not reading variables

Re: PROC TTEST not reading variables

Re: PROC TTEST not reading variables

Re: PROC TTEST not reading variables

Re: PROC TTEST not reading variables

Re: PROC TTEST not reading variables

PROC TTEST not reading variables

Re: PROC TTEST not reading variables

Re: PROC TTEST not reading variables

Re: PROC TTEST not reading variables

Re: PROC TTEST not reading variables

Re: PROC TTEST not reading variables

Click image to register for webinar

Classroom Training Available!