Hi Iam new to SAS and statistics,
Can I know which is the best approach to transform non normal data(positive,negative,zero values) distribution to NORMAL.
I came across few internet sites which mentioned to perform Log transformation by adding a constant.But some says this is not a good approach.
I have attached(T test file) on which I want to run "T test " on "overall difference variable " to see if my template worked better - here post test score is the score after template where as pretest score is the scorer before giving template(i.e instructions).
Since the data is skewed and I ran some normality test check "Shapiro wilk" and I got a P value which is less than 0.05.
So based on the shapiro result and my histogram on "overall score difference " variable I came to know my variable is not normally distributed so I cant run T test and decided to do a normal transformation.But my difference variable has negative,positive,zero values.
Thanks for your help in advance.Please ask me questions if I am not clear as this is my first post.
With a smallish sample, 31 records, and paired observations, prescore and postscore, I would be strongly tempted to look at a non-parametric test such as Wilcoxon.
Hi ballardw I could remember some basic stats which says if n<30 we use Wilcoxon since T test applies for observations when n>30.
But my sample contains n>30 so should I use "Wilcoxon" OR else "transform the actual data and then do T test"
@Deekshana wrote:
Hi ballardw I could remember some basic stats which says if n<30 we use Wilcoxon since T test applies for observations when n>30.
But my sample contains n>30 so should I use "Wilcoxon" OR else "transform the actual data and then do T test"
Before going to the complexity of transforming the data I would tend to run both a Wicoxon and TTest with the PAIRED option and see how the results look.
Also those guidelines for thirty are general in nature. A WILCOXON test is just less efficient if the data is actually normal but it will work on much larger data sets. The advantage being that the only requirement the numeric data values have some actual meaning such as a measurement.
The >30 is that when you treat multiple groups of more than 30 records the means tend behave more or less normally. So you likely don't need to do a transform at all. Which is why I suggest trying both and comparing results.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.