Winzorization BY GROUP

Reply
Contributor
Posts: 24

Winzorization BY GROUP

Hello,

I found a useful thread on winsorization on these forums; however the data in the example is not split into groups and I'm not able to get it to work on my data.  The code that someone posted in the other thread is:

proc univariate data=have noprint;

   var price size;

   output out=temp pctlpts  = 1 99 pctlpre  = price size pctlname = pct1 pct99;

run;

/* create 4 macro variables with the 4 interested values */

data _null_;

  set temp;

  call symputx('price1',pricepct1);

  call symputx('price99',pricepct99);

  call symputx('size1',sizepct1);

  call symputx('size99',sizepct99);

run;

%put _user_;

data want;

   set have;

   where (&price1< price<&price99) and (&size1 <size< &size99);

run;

proc print;run;

The only problem is that my data looks like the table below.  I want to delete the prices at the 1 percentile and 99 percentile, keeping everything in between so that I can then take a mean of those prices for each symbol.  However, I don't know how to tell SAS to do each SYMBOL on its own.  Any ideas?

PriceSymbol
11AA
12AA
18AA
22AA
100XYZ
102XYZ
109XYZ
114XYZ

Thanks in advance.

-Razzle

Super User
Posts: 19,851

Re: Winzorization BY GROUP

Posted in reply to RazzleBayker

In this case you may want to calculate your percentiles by group, so add in the by symbol to the proc univariate.

Then instead of a macro variable, I'd merge it in with by groups and then filter (untested below):

proc means data=have n p1 p99;

class symbol;

var price;

output out=ranges p1(price)=cutoff_low p99(price)=cutoff_high;

run;

proc sql;

     create table want as

select a.*, b.cutoff_low, b.cutoff_high

from have a

left join ranges b

on a.symbol=b.symbol;

quit;

You can add in your where clause into the sql or do it in a separate data step.

Contributor
Posts: 24

Re: Winzorization BY GROUP

Thanks for your help.

The first part (proc means) works, but the second step does not give me the results I need.  Once we identify the high cutoff and the low cutoff for each symbol, I want to create a new data set that keeps everything in between the cutoffs for each respective symbol.  I've been scratching my head for while trying to figure this out.  Any ideas?

Super User
Posts: 19,851

Re: Winzorization BY GROUP

Posted in reply to RazzleBayker

Did you add in the where clause?

If so show your code.

Respected Advisor
Posts: 4,930

Re: Winzorization BY GROUP

Posted in reply to RazzleBayker

What you seem to want is trimmed means, not Winsorized means. You can get them simply like this :

proc univariate data=have trimmed=0.02;

by symbol notsorted;

var price size;

run;

where 0.02 is the proportion to be trimmed off.

PG

PG
Ask a Question
Discussion stats
  • 4 replies
  • 288 views
  • 6 likes
  • 3 in conversation