winsorization

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 17
Accepted Solution

winsorization

Hi does anyone know how to winsorize variables in a data? Let say you have this variable called size and prices in your data and you want them to be 'winsorized' at the 1st and 99th percentile? thanks!


Accepted Solutions
Solution
‎05-02-2012 10:06 AM
Super Contributor
Posts: 1,636

Re: winsorization

Posted in reply to rilatotoro

Is this what you want?

data have;
input price size;
cards;
10 3
20 5
30 5
45 1
2 3
20 2
30 2
45 1
38 3
20 3
39 3
46 1
;
run;

proc univariate data=have noprint;
   var price size;
   output out=temp pctlpts  = 1 99 pctlpre  = price size pctlname = pct1 pct99;
run;
/* create 4 macro variables with the 4 interested values */
data _null_;
  set temp;
  call symputx('price1',pricepct1);
  call symputx('price99',pricepct99);
  call symputx('size1',sizepct1);
  call symputx('size99',sizepct99);
run;
%put _user_;
data want;
   set have;
   where (&price1< price<&price99) and (&size1 <size< &size99);
run;
proc print;run;
                                        Obs    price    size

                                         1       10       3
                                         2       20       2
                                         3       30       2
                                         4       38       3
                                         5       20       3
                                         6       39       3

Linlin

View solution in original post


All Replies
Super Contributor
Posts: 349

Re: winsorization

Posted in reply to rilatotoro

Hi,

Is this the output you required...

data have;

input price size;

cards;

10 3

20 5

30 5

45 1

2 3

20 5

30 5

45 1

38 3

20 5

39 5

55 1

;

run;

proc univariate data=have noprint;

   var price size;

   output out=want mean=mean std=std pctlpts  = 1 99 pctlpre  = price size

                                       pctlname = pct1 pct99;

run;

proc sql;

select min(price) into :minimum from have where price in(select price from have having price<>MIN(price) );

select max(price) into :maximum from have where price in(select price from have having price<>max(price) );

select min(size) into :sminimum from have where size in(select size from have having size<>MIN(size) );

select max(size) into :smaximum from have where size in(select size from have having size<>max(size) );

select pricepct1,pricepct99,sizepct1,sizepct99 into :actual1,:actual99,:size1,:size99 from want;

quit;

data test;

merge have want;

if price=&actual1 then price=&minimum;

else if price=&actual99 then price=&maximum;

if size=&size1 then size=&sminimum;

else if size=&size99 then size=&smaximum;

run;

Thanks,

Shiva

Occasional Contributor
Posts: 17

Re: winsorization

Hi Shiva,

Thanks for the help. It is something close to what i want. However, I do not want an output or mean yet. I just want to winsorize or eliminate those extreme observations in my variable column. There are still things that I want to do before applying proc univariate. Is there any other way to winsorize or eliminate the variables below the 1st percentile and 99th percentile? Thanks!

Clifton

Frequent Contributor
Posts: 101

Re: winsorization

Posted in reply to rilatotoro

Shiva did winsorize the data. The test dataset that is created contains the winsorized values. For example, look at price. The lowest value of 2 was replaced by the next nearest value of 10. The highest value of 55 was replaced with the next nearest value of 45. That's winsorization.

priceprice_winsorized
210
1010
2020
2020
2020
3030
3030
3838
3939
4545
4545
5545
Occasional Contributor
Posts: 17

Re: winsorization

Posted in reply to SAS_Bigot

Yep I did  know that the data were winsorized. But how did you get the price_winsorized column? I want it to be sort of like side by side with my original column.

Frequent Contributor
Posts: 101

Re: winsorization

Posted in reply to rilatotoro

I just merged the datasets have and test and renamed price from test price_winsorized.

Solution
‎05-02-2012 10:06 AM
Super Contributor
Posts: 1,636

Re: winsorization

Posted in reply to rilatotoro

Is this what you want?

data have;
input price size;
cards;
10 3
20 5
30 5
45 1
2 3
20 2
30 2
45 1
38 3
20 3
39 3
46 1
;
run;

proc univariate data=have noprint;
   var price size;
   output out=temp pctlpts  = 1 99 pctlpre  = price size pctlname = pct1 pct99;
run;
/* create 4 macro variables with the 4 interested values */
data _null_;
  set temp;
  call symputx('price1',pricepct1);
  call symputx('price99',pricepct99);
  call symputx('size1',sizepct1);
  call symputx('size99',sizepct99);
run;
%put _user_;
data want;
   set have;
   where (&price1< price<&price99) and (&size1 <size< &size99);
run;
proc print;run;
                                        Obs    price    size

                                         1       10       3
                                         2       20       2
                                         3       30       2
                                         4       38       3
                                         5       20       3
                                         6       39       3

Linlin

Frequent Contributor
Posts: 101

Re: winsorization

Posted in reply to rilatotoro

Proc univariate will calculate trimmed and winsorized means. Here's the section that talks about robust estimators and explains winsorization.

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univaria...

Valued Guide
Posts: 765

Re: winsorization

Posted in reply to rilatotoro

hi ... another idea (limited by how many variable values will fit into &PRICE and &SIZE) ...

data have;
input price size @@;
cards;
10 3 20 5 30 5 45 1 2 3 20 2 30 2 45 1 38 3 20 3 39 3 46 1
;

proc sql;
select price, size into Smiley Tonguerice separated by ',', :size separated by ',' from have;
select pctl(1,&price), pctl(99,&price), pctl(1,&size), pctl(99,&size) into Smiley Tongue1, Smiley Tongue99, :s1, :s99 from have (obs=1);
create table winsor as
select * from have where price gt &p1 and price lt &p99 and
                         size  gt &s1 and size  lt &s99;
quit;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 4069 views
  • 6 likes
  • 5 in conversation