09-10-2014 02:27 AM
I am curious whether I should winsorize the data before or after constructing a new variable.
eg. If I want to find Q where Q = P/R.
Should I winsorize P and R then constructing Q or should I winsorize Q directly once I already estimate the Q?
Will these two ways provide the same result and which way is more common?
Thank you very much!
09-10-2014 08:12 AM
Somehow I get the impression that there are multiple measurements of P and R on a number of records. If not, then I do not see how you would get multiple values for Q, and thus have a need for a robust estimator. There would simply be a single estimate of Q based on the ratio of the winsorized values of P and R.
Thus, the more interesting situation is multiple instances of P and R for multiple subjects (for want of a better term). Now we need to address the issue of P and R--are they independent measures, or are they concomitant, so that the ratio Q is immediately apparent? This will make a huge difference in approach. More information about how the data fit together is needed for an answer to this question.
09-10-2014 03:56 PM
Thank you very much for your suggestion. To be more specific, I am working on creating the financial variables e.g. ROA, Market-to-Book value, and sales growth. For ROA, it comes from net income divided by total assets. My question is which one is the correct way to do between:
1) Winsorize the net income and total assets then estimate ROA, or
2) Estimate ROA from the original net income and total assets data first then winsorize the ROA
Thank you very much
09-11-2014 08:00 AM
Mostly this is going to depend on which variable you need a robust estimator for. One thing to watch out for with winsorizing the numerator and denominator first is a strong positive correlation between the two. Winsorizing first may remove a lot of this, yielding something that results in a poor estimator of the ratio.
The expected value of a ratio of two random variables has uncertain distributional properties, and the expected value and variance of that distribution are not simple functions of the mean, but involve the variance and covariance of each of the terms. (If the two are independent, then it could be a Cauchy or generalized Cauchy distribution, neither of which has a first moment). If you do winsorize first, then the winsorized means, variances and covariances ought to be used to construct an approximate expected value of the ratio.
Expectation of a ratio of random variables (Taylor Series expansion) and variance of a ratio of random variables can be found in the attached file