BookmarkSubscribeRSS Feed
Deekshana
Calcite | Level 5

Hi Iam new to SAS and statistics,

 

Can I know which is the best approach to transform non normal data(positive,negative,zero values) distribution to NORMAL.

I came across few internet sites which mentioned to perform Log transformation by adding a constant.But some says this is not a good approach.

 

I have attached(T test file) on which I want to run "T test " on "overall difference variable " to see if my template worked better - here post test score is the score after template where as pretest score is the scorer before giving template(i.e instructions).

Since the data is skewed and I ran some normality test check "Shapiro wilk" and I got a P value which is less than 0.05.

 

  So based on the shapiro result and my histogram on "overall score difference " variable I came to know my variable is not normally distributed so I cant run T test and decided to do a normal transformation.But my difference variable has negative,positive,zero values. 

Thanks for your help in advance.Please ask me questions if I am not clear  as this is my first post.

4 REPLIES 4
ballardw
Super User

With a smallish sample, 31 records, and paired observations, prescore and postscore, I would be strongly tempted to look at a non-parametric test such as Wilcoxon.

Deekshana
Calcite | Level 5

Hi ballardw I could remember some basic stats which says if n<30 we use Wilcoxon since T test applies for observations when n>30.

But my sample contains n>30 so should I use "Wilcoxon" OR else "transform the actual data and then do T test"

ballardw
Super User

@Deekshana wrote:

Hi ballardw I could remember some basic stats which says if n<30 we use Wilcoxon since T test applies for observations when n>30.

But my sample contains n>30 so should I use "Wilcoxon" OR else "transform the actual data and then do T test"


Before going to the complexity of transforming the data I would tend to run both a Wicoxon and TTest with the PAIRED option and see how the results look.

 

Also those guidelines for thirty are general in nature. A WILCOXON test is just less efficient if the data is actually normal but it will work on much larger data sets. The advantage being that the only requirement the numeric data values have some actual meaning such as a measurement.

The >30 is that when you treat multiple groups of more than 30 records the means tend behave more or less normally. So you likely don't need to do a transform at all. Which is why I suggest trying both and comparing results.

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1878 views
  • 0 likes
  • 3 in conversation