turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- How to winsorize correctly?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-10-2014 02:27 AM

I am curious whether I should winsorize the data before or after constructing a new variable.

eg. If I want to find Q where Q = P/R.

Should I winsorize P and R then constructing Q or should I winsorize Q directly once I already estimate the Q?

Will these two ways provide the same result and which way is more common?

Thank you very much!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-10-2014 08:12 AM

Somehow I get the impression that there are multiple measurements of P and R on a number of records. If not, then I do not see how you would get multiple values for Q, and thus have a need for a robust estimator. There would simply be a single estimate of Q based on the ratio of the winsorized values of P and R.

Thus, the more interesting situation is multiple instances of P and R for multiple subjects (for want of a better term). Now we need to address the issue of P and R--are they independent measures, or are they concomitant, so that the ratio Q is immediately apparent? This will make a huge difference in approach. More information about how the data fit together is needed for an answer to this question.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-10-2014 03:56 PM

Thank you very much for your suggestion. To be more specific, I am working on creating the financial variables e.g. ROA, Market-to-Book value, and sales growth. For ROA, it comes from net income divided by total assets. My question is which one is the correct way to do between:

1) Winsorize the net income and total assets then estimate ROA, or

2) Estimate ROA from the original net income and total assets data first then winsorize the ROA

Thank you very much

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-11-2014 08:00 AM

Mostly this is going to depend on which variable you need a robust estimator for. One thing to watch out for with winsorizing the numerator and denominator first is a strong positive correlation between the two. Winsorizing first may remove a lot of this, yielding something that results in a poor estimator of the ratio.

The expected value of a ratio of two random variables has uncertain distributional properties, and the expected value and variance of that distribution are not simple functions of the mean, but involve the variance and covariance of each of the terms. (If the two are independent, then it could be a Cauchy or generalized Cauchy distribution, neither of which has a first moment). If you do winsorize first, then the winsorized means, variances and covariances ought to be used to construct an approximate expected value of the ratio.

Expectation of a ratio of random variables (Taylor Series expansion) and variance of a ratio of random variables can be found in the attached file

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-14-2014 01:11 PM

Thank you very much for your explanation. It is very useful.

Mew Piriyakul

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-10-2014 08:37 AM

Did you mean how to look for a outlier ? if value <= mean-2*std or value >= mean+2*std , it should be taken as an outlier .