BookmarkSubscribeRSS Feed
rjnc13
Calcite | Level 5

Hi,

I have the following two data sets and below is the delay in days between actions by each of the sample groups.

The split was as follows:

Group A Test -  20%

Group B Control - 80%

With the data being negatively skewed im jut curious as to the best method for calculating if there is a significance change in days between groups.

Firstly as it is negatively skewed am i best to normalise the data set by using a logarithmic function?

Secondly with testing the significance should i be using the Kolmogorov-Smirnov Test or the Wilcoxon-Mann-Whitney (skewed distributions).

I have provided the output of the distribution of the groups below.

Any help would be greatly appreciated.

Thanks!

11 REPLIES 11
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Log transformation would definitely not work (opposite effect from what you need). You could try the Box-Cox transformation. THis can be done with PROC TRANSREG (see example 6). If the data for the two groups are stacked, with g=0 for the first group and g=1 for the second, then you would use:

model boxcox(y) = identity(g);

rjnc13
Calcite | Level 5

Thanks for the reply, do you have a reference for example 6?

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

The reference is the SAS/STAT User's Guide. It was example 6 in an older version of SAS. This example has been replaced by a more complex example in SAS 9.3 and later:

SAS/STAT(R) 9.3 User's Guide

This is more complex than you need.

art297
Opal | Level 21

I think the example you mentioned can still be found at: Documentation

rjnc13
Calcite | Level 5

Thanks again for your prompt response.

The groups are stacked so i have assigned a numeric value to each (0 = Control, Test = 1).

However when i go to process the below query, it is claiming there to be invalid values were encountered, i cannot find any more information online as to what is the cause....

proc transreg data=test;

   model boxcox(day_delay)=identity(group_class);

run;

Looking at the values all are valid numbers and are formatted to be numeric (there are 0's in the data set).

Thanks again!

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Oops, I wasn't paying attention that you have 0s. This transformation is only defined for nonzero positive values. You could add a small constant to all the data, but this becomes ad hoc (but commonly done). You might want to resort to nonparametric methods. You could use either of the NO tests you mentions.

Reeza
Super User

Do you need to transform them if you're using non parametric tests?

The key is the distributions between the two groups are the same which by eyeballing, I'd say is pretty good.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Do not transform

rjnc13
Calcite | Level 5

Thanks again for your reply, so i dont need to transform, as either the Kolmogorov-Smirnov Test or the Wilcoxon-Mann-Whitney are both non-parametric.

Would you trial both tests? Or based on the above is one a better fit than the other?

rjnc13
Calcite | Level 5

Thanks for your reply it makes sense! Based on the above information would you use the Kolmogorov-Smirnov Test  or the Wilcoxon-Mann-Whitney?

Reeza
Super User

Google says:

GraphPad Statistics Guide

In the field I'm in (medical/health) I've usually used KS.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 3718 views
  • 6 likes
  • 4 in conversation