02-07-2016 07:32 PM
I am running a t-test on a set of observations. There is one observation that is a massive outlier and I wanted to rerun the test without this observation. However, I don't want to go through the rigmarole of completely deleting the data point. Is there a way to get SAS to run the t-test without a certain observation?
02-07-2016 07:52 PM
If you have some way to uniquely identify the observation then yes, use a WHERE statement in your proc ttest.
Typically a WHERE statement can be included in almost all Procs.
Where obs_id = 10;
02-07-2016 07:59 PM
So do I include a statement after the WHERE statement to ask SAS to remove it from the analysis? Or do I simply include that in the proc and it will run without observation 10
PROC TTEST data=ctpeat3;
Thanks Reeza, you're a beast.
02-07-2016 08:07 PM
Do you observation IDs?
You our include the WHERE in your proc and it will automatically exclude it from analysis.
PROC TTEST data=ctpeat3; Where obs ne 10;; CLASS CTPPP; var eatv; RUN;
02-08-2016 03:09 PM - edited 02-08-2016 03:12 PM
If I may step in here, logical operators such as AND, OR, NOT are commonly used in WHERE statements. As always in programming, you have to use the correct syntax, though.
Example: Let's say, you want to exclude a patient if they have patient number 10 or 13. In SAS, the OR operator must not be placed between the numbers 10 and 13 like it's possible in human language, but it must be placed between two expressions which evaluate as true or false:
where not (pt_no=10 or pt_no=13);
This is logically equivalent to:
where pt_no ne 10 and pt_no ne 13;
But typically, this would be written (again equivalently) using the IN operator:
where pt_no not in (10, 13);
Or even shorter:
where pt_no ~in (10 13);
(The abbreviation of NOT as tilde (~) might not available on all keyboards, but you can use the caret (^) instead. The comma is optional in lists used with the IN operator. These lists may contain more than two values.)
You wrote that the observation to be excluded was a "massive outlier." In this case it could be an alternative to replace the "hard-coded" condition on PT_NO by a condition on the measurement variable(s) which characterize the data point as an extreme outlier (e.g. where 10 <= bmi <= 60;).
Need further help from the community? Please ask a new question.