BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jjsingh04
Obsidian | Level 7

As shown in the following example, the order of operations--demeaning vs. winsorizing--makes a big difference in the results. Which should I do first? 

 

In the example below, we're starting with the 10 observations: 0, 0, 50, 50, 50, 50, 70, 70, 80, and 80. In the columns/procedure on the left, we demean first, In the columns/procedure on the right, we winsorize first. (I normally winsorize at the 1% and 99% levels--not 10% and 90%, but had to use the latter numbers for the sake of simplicity in the example.) 

 

Thanks! 

 

 

  Observed X Demeaned X Winsorized (10%,90%)   Observed X Winsorized (10%,90%) Demeaned X
  0 -50 0   0 50 -8
  0 -50 0   0 50 -8
  50 0 0   50 50 -8
  50 0 0   50 50 -8
  50 0 0   50 50 -8
  50 0 0   50 50 -8
  70 20 20   70 70 12
  70 20 20   70 70 12
  80 30 20   80 70 12
  80 30 20   80 70 12
Mean: 50     Mean:   58  
Our lives are enriched by the people around us.
1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

If the purpose of these operations is to protect against outliers, you should winsorize before centering, because outliers can have a very large influence on the mean used for centering. So, better remove them first.

PG

View solution in original post

4 REPLIES 4
PGStats
Opal | Level 21

If the purpose of these operations is to protect against outliers, you should winsorize before centering, because outliers can have a very large influence on the mean used for centering. So, better remove them first.

PG
Rick_SAS
SAS Super FREQ

To echo PGStasts, the Winsorized mean is a robust estimate of location. If your goal is to center the data in a robust way, use a robust estimate.  If you are going to scale the data, use a robust estimate of scale.

jjsingh04
Obsidian | Level 7

Thanks very much Rick! What you and PG are saying makes perfect sense! 

J.J.

Our lives are enriched by the people around us.
jjsingh04
Obsidian | Level 7

Thanks very much PG! What you and Rick are saying makes perfect sense! 

J.J.

 

Our lives are enriched by the people around us.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1063 views
  • 5 likes
  • 3 in conversation