Hi There,
I am running a t-test on a set of observations. There is one observation that is a massive outlier and I wanted to rerun the test without this observation. However, I don't want to go through the rigmarole of completely deleting the data point. Is there a way to get SAS to run the t-test without a certain observation?
Cheers!
For anyone wondering, I just managed to do this by adding the following to my T-test procedure.
WHERE pt_no NE 10;
If you have some way to uniquely identify the observation then yes, use a WHERE statement in your proc ttest.
Typically a WHERE statement can be included in almost all Procs.
Where obs_id = 10;
So do I include a statement after the WHERE statement to ask SAS to remove it from the analysis? Or do I simply include that in the proc and it will run without observation 10
PROC TTEST data=ctpeat3;
CLASS CTPPP;
var eatv;
RUN;
Thanks Reeza, you're a beast.
Do you observation IDs?
You our include the WHERE in your proc and it will automatically exclude it from analysis.
PROC TTEST data=ctpeat3;
Where obs ne 10;;
CLASS CTPPP;
var eatv;
RUN;
Hey Reeva,
Can you exclude multiple observations using this method? It seems where statements can't be combined with an OR statement, is that right?
Cheers
If I may step in here, logical operators such as AND, OR, NOT are commonly used in WHERE statements. As always in programming, you have to use the correct syntax, though.
Example: Let's say, you want to exclude a patient if they have patient number 10 or 13. In SAS, the OR operator must not be placed between the numbers 10 and 13 like it's possible in human language, but it must be placed between two expressions which evaluate as true or false:
where not (pt_no=10 or pt_no=13);
This is logically equivalent to:
where pt_no ne 10 and pt_no ne 13;
But typically, this would be written (again equivalently) using the IN operator:
where pt_no not in (10, 13);
Or even shorter:
where pt_no ~in (10 13);
(The abbreviation of NOT as tilde (~) might not available on all keyboards, but you can use the caret (^) instead. The comma is optional in lists used with the IN operator. These lists may contain more than two values.)
You wrote that the observation to be excluded was a "massive outlier." In this case it could be an alternative to replace the "hard-coded" condition on PT_NO by a condition on the measurement variable(s) which characterize the data point as an extreme outlier (e.g. where 10 <= bmi <= 60;).
For anyone wondering, I just managed to do this by adding the following to my T-test procedure.
WHERE pt_no NE 10;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.