02-18-2013 09:54 PM
I found a useful thread on winsorization on these forums; however the data in the example is not split into groups and I'm not able to get it to work on my data. The code that someone posted in the other thread is:
proc univariate data=have noprint;
var price size;
output out=temp pctlpts = 1 99 pctlpre = price size pctlname = pct1 pct99;
/* create 4 macro variables with the 4 interested values */
where (&price1< price<&price99) and (&size1 <size< &size99);
The only problem is that my data looks like the table below. I want to delete the prices at the 1 percentile and 99 percentile, keeping everything in between so that I can then take a mean of those prices for each symbol. However, I don't know how to tell SAS to do each SYMBOL on its own. Any ideas?
Thanks in advance.
02-18-2013 10:12 PM
In this case you may want to calculate your percentiles by group, so add in the by symbol to the proc univariate.
Then instead of a macro variable, I'd merge it in with by groups and then filter (untested below):
proc means data=have n p1 p99;
output out=ranges p1(price)=cutoff_low p99(price)=cutoff_high;
create table want as
select a.*, b.cutoff_low, b.cutoff_high
from have a
left join ranges b
You can add in your where clause into the sql or do it in a separate data step.
02-19-2013 06:17 PM
Thanks for your help.
The first part (proc means) works, but the second step does not give me the results I need. Once we identify the high cutoff and the low cutoff for each symbol, I want to create a new data set that keeps everything in between the cutoffs for each respective symbol. I've been scratching my head for while trying to figure this out. Any ideas?
02-19-2013 08:25 PM
What you seem to want is trimmed means, not Winsorized means. You can get them simply like this :
proc univariate data=have trimmed=0.02;
by symbol notsorted;
var price size;
where 0.02 is the proportion to be trimmed off.