My doubt is whether it is correct for me to consider the two factors of my experiment to remove the autiliers or do I just consider the main factor, follow the model I thought of using:
proc glm;
class BLOC factor1 factor2;
model GMDadap = BLOC factor1 factor1 * factor2;
output out = dois residual = x_res student = x_stu ;
run;
proc print;
run;
proc means n mean;
class factor1;
var gmdadap;
run;
proc univariate normal plot data=dois;
histogram x_stu/normal;
var x_stu ;
run;
Outliers for what variable(s)?
Remove for which step?
What constitutes an "outlier", as in rule(s), for each variable?
This is a reasonable way to approach outlier detection. Of course, there are plenty of other methods, including methods if the data is not normally distributed (such as box plot outliers), and even multivariate outlier detection. And possibly dozens of other methods.
There's no universally agreed upon method of detecting outliers. I think if you are going to fit a model with two factors, the outliers in Y ought to be detected via residuals, which means to me that all terms in the model should be used.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.