Programming the statistical procedures from SAS

Winsorize independent variable for two groups separately or for entire population

Reply
Occasional Contributor
Posts: 9

Winsorize independent variable for two groups separately or for entire population

 Hi,

 

I'm working with a dataset that is a combination of two separate datasets. There are certain variables that are positively skewed. I have decided to winsorize to address this but I wasn't sure if I should winsorize the variables from the different datasets separately or all together.

 

For example (winsorize at 75th percentile):

 

Dataset  Freckles  Winsorize_together_75  Winsorize_groups_separately_75

1           10                     10                                    10

1           15                     15                                    15

1           20                     20                                     20 

1           99                     75                                     99

1           100                   75                                     99

1           10                     10                                     10

2            15                    15                                     15

2            20                    20                                     20   

2            20                    20                                     20

2            25                    25                                     25

2            75                    75                                     55

2            105                  75                                     55

2            35                    35                                     35

2            35                    35                                     35

 

Should I winsorize the positively skewed variable for the overall dataset or for the two datasets separately?

 

Thanks!

 

Trusted Advisor
Posts: 1,413

Re: Winsorize independent variable for two groups separately or for entire population

The decision on what analysis to do depends on the goal of the analysis. Can you please state the goal of your analysis? Thanks.

Occasional Contributor
Posts: 9

Re: Winsorize independent variable for two groups separately or for entire population

Thanks for your reply. I'm trying to perform a t-test.

 

To give you more background, the individuals all come from one online community but were divided based on how they responded to a particular question. 

Trusted Advisor
Posts: 1,413

Re: Winsorize independent variable for two groups separately or for entire population


he2182 wrote:

Thanks for your reply. I'm trying to perform a t-test.

 

To give you more background, the individuals all come from one online community but were divided based on how they responded to a particular question. 


This doesn't really tell us whether the distributions of the individuals are the same or different, based on how they responded to a particular question. So, I don't really have a recommendation based on this about how to perform the winsorizing.

 

But I do agree completely with @SteveDenham on this matter, in which case the issue of how to winsorize isn't relevant.

SAS Super FREQ
Posts: 3,305

Re: Winsorize independent variable for two groups separately or for entire population

To expand on PaigeMiller's suggestion, do you think the data are coming from a single population or from two different populations?

 

if you merge and then Winsorize, you are assuming that each sample is drawn from the same population. If you Winsorize separately, you are implicitly assuming that each sample comes from it's own population, which makes me wonder whether it is appropriate to merging them together.

Trusted Advisor
Posts: 1,413

Re: Winsorize independent variable for two groups separately or for entire population


Rick_SAS wrote:

To expand on PaigeMiller's suggestion, do you think the data are coming from a single population or from two different populations?



Well, Rick, that's a good point, but it wasn't my point. My point is that if you are looking to compare means or medians, then that might lead to one decision, and if you are looking to compare standard deviations or variances, you might choose a different decision. We don't know what comparison or analysis the user wants to do.

Respected Advisor
Posts: 2,655

Re: Winsorize independent variable for two groups separately or for entire population

I don't particularly care for Winsorizing data.  Trading a long tail for a heavy tail accomplishes little insofar as having a normal distribution, and it grossly underestimates the true variance, no matter what the distribution.  If there isn't a particular process known to generate the data (waiting times, counts, etc.), then it's probably not a good idea to assume an underlying non-normal distribution, sucha as a gamma or Poisson.  Which means:

 

Why not use a nonparametric test?  The median is probably a better indicator of central tendency for these samples in any case, so a Wilcoxon rank sum test would be nearly ideal for what I think the OP is trying to do.

 

Steve Denham

Occasional Contributor
Posts: 9

Re: Winsorize independent variable for two groups separately or for entire population

Thank you for your suggestion.  

 

I have already performed the Wilcoxon rank sum test but also wanted to do a t-test.

Respected Advisor
Posts: 2,655

Re: Winsorize independent variable for two groups separately or for entire population

Why do the t test?  You have already tested whether the two groups differ as far as location.  If you did do another test, did you plan on adjusting the p value for multiple testing?  Or, and I really hope this is not the case, were you going to keep doing tests until you found one that agreed with your hoped for outcome?  I offer the following from John Tukey:The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.

Steve Denham

Occasional Contributor
Posts: 9

Re: Winsorize independent variable for two groups separately or for entire population

Thank you for you concern. I wanted to report both results. The Wilcoxon rank sum test results were significant so worry not, we were not on a fishing expedition. 

Ask a Question
Discussion stats
  • 9 replies
  • 411 views
  • 1 like
  • 4 in conversation