BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
geneticsum
Fluorite | Level 6

 

 

I'm trying to run a matched pairs t-test (SAS 9.3) on some gene expression data, matching the gene expression values for unaffected subjects to expression for affected subjects. The two affection states make the two variables unaffected_gene_x and affected_gene_x. There are at least 2500 genes so I've constructed a macro. Problem is, results come up with n=0. 

 

Here is the code: (I've set total=10 for the macro so I could check if it worked before doing all the genes)

 

%macro runtime(total);
%do I=1 %to &total.;

proc ttest data=pairs;
	paired unaffected_gene_&i.*affected_gene_&i.;
run;

%end;

%mend;
%runtime(10); 

  

I have double checked the pairs dataset to ensure it has all the appropriate data. Each variable has 3 observations and 3 missing values (n=6 total individuals). For example, the variable unaffected_gene_1 has 3 values for the unaffected individuals, and 3 missing values for the affected individuals. Log does not indicate any errors (ie. finding the variables). 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Show some example data for one of the pairs, 10 or 15 records should provide sufficient information.

 

Any record that does not have values for both variables on the PAIRED statement is excluded. Which sounds like the issue with your narrative about "missing".

Paired is intended sort of for before/after (or some other similar contexts). 

If the data is from different GROUPS of records then you should have a classification variable that takes values of Affected and Unaffected, that would go on a CLASS statement, and a single variable holding the numeric value, such as Gene_1 on a VAR statement.

 

You may want to look into reshaping your data and running without a macro. This might work for 10 genes. If so then change the array definition to 2500 or the number of actual gene variables you have.

 

data need;
   set pairs;
   array u unaffected_gene_1-unaffected_gene_10;
   array a affected_gene_1  -  affected_gene_10;
   do gene=1 to dim(u);
      status='Unaffected';value=u[gene];output;
      status='Affected';value=a[gene];output;
   end;
   keep gene status value;
run;
proc sort data=need;
   by gene;
run;

proc ttest data=need;
   by gene;
   class status ;
   var value;
run;

      

View solution in original post

4 REPLIES 4
ballardw
Super User

Show some example data for one of the pairs, 10 or 15 records should provide sufficient information.

 

Any record that does not have values for both variables on the PAIRED statement is excluded. Which sounds like the issue with your narrative about "missing".

Paired is intended sort of for before/after (or some other similar contexts). 

If the data is from different GROUPS of records then you should have a classification variable that takes values of Affected and Unaffected, that would go on a CLASS statement, and a single variable holding the numeric value, such as Gene_1 on a VAR statement.

 

You may want to look into reshaping your data and running without a macro. This might work for 10 genes. If so then change the array definition to 2500 or the number of actual gene variables you have.

 

data need;
   set pairs;
   array u unaffected_gene_1-unaffected_gene_10;
   array a affected_gene_1  -  affected_gene_10;
   do gene=1 to dim(u);
      status='Unaffected';value=u[gene];output;
      status='Affected';value=a[gene];output;
   end;
   keep gene status value;
run;
proc sort data=need;
   by gene;
run;

proc ttest data=need;
   by gene;
   class status ;
   var value;
run;

      
geneticsum
Fluorite | Level 6

I agree, that sounds like the problem here - each row only has a value for either the affected state or the unaffected state, not both. The reason the dataset is constructed this way is because each row is an individual subject.  

 

Thank you very much! I have tried the code and it works beautifully. If you have a little more time - do you know if there is a way to filter results so I can narrow it down to any genes where difference is significant? 

ballardw
Super User

@geneticsum wrote:

I agree, that sounds like the problem here - each row only has a value for either the affected state or the unaffected state, not both. The reason the dataset is constructed this way is because each row is an individual subject.  

 

Thank you very much! I have tried the code and it works beautifully. If you have a little more time - do you know if there is a way to filter results so I can narrow it down to any genes where difference is significant? 


I sort of thought something like this question would be the next bit.

I would create one or more data sets from the output. You can create data sets of the various test or summary statistics using ODS OUTPUT. The syntax to add to the Proc TTEST code would look something like:

 

Ods output ttests = yourlib.ttestsdatasetname;

TTests following the output above is the name of an ODS table generated by the procedure, there are others that hold the other bits requested that can be found the documentation for the procedure under details>ODS Table Names.

The dataset should have both the t-statistic in a variable named tValue and the Pr.|t| in Probt. There will be two records per test that use two different methods for estimating variance, Pooled and Satterhwaite, in the variable Method , plus the by variable(s) and the VAR variable name(s). You would filter the desired information based on the method and the tValue or ProbT range of interest.

 

If you want to reduce the amount of output sent to the results, such as only wanting the ttest results you can also use an ODS SELECT , or EXCLUDE, to only create desired output table(s) immediately prior to the Proc Ttest statement.

 

geneticsum
Fluorite | Level 6
Wonderful, thank you!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 940 views
  • 2 likes
  • 2 in conversation