## winsorization

Solved
Occasional Contributor
Posts: 17

# winsorization

Hi does anyone know how to winsorize variables in a data? Let say you have this variable called size and prices in your data and you want them to be 'winsorized' at the 1st and 99th percentile? thanks!

Accepted Solutions
Solution
‎05-02-2012 10:06 AM
Super Contributor
Posts: 1,636

## Re: winsorization

Posted in reply to rilatotoro

Is this what you want?

data have;
input price size;
cards;
10 3
20 5
30 5
45 1
2 3
20 2
30 2
45 1
38 3
20 3
39 3
46 1
;
run;

proc univariate data=have noprint;
var price size;
output out=temp pctlpts  = 1 99 pctlpre  = price size pctlname = pct1 pct99;
run;
/* create 4 macro variables with the 4 interested values */
data _null_;
set temp;
call symputx('price1',pricepct1);
call symputx('price99',pricepct99);
call symputx('size1',sizepct1);
call symputx('size99',sizepct99);
run;
%put _user_;
data want;
set have;
where (&price1< price<&price99) and (&size1 <size< &size99);
run;
proc print;run;
Obs    price    size

1       10       3
2       20       2
3       30       2
4       38       3
5       20       3
6       39       3

Linlin

All Replies
Super Contributor
Posts: 350

## Re: winsorization

Posted in reply to rilatotoro

Hi,

Is this the output you required...

data have;

input price size;

cards;

10 3

20 5

30 5

45 1

2 3

20 5

30 5

45 1

38 3

20 5

39 5

55 1

;

run;

proc univariate data=have noprint;

var price size;

output out=want mean=mean std=std pctlpts  = 1 99 pctlpre  = price size

pctlname = pct1 pct99;

run;

proc sql;

select min(price) into :minimum from have where price in(select price from have having price<>MIN(price) );

select max(price) into :maximum from have where price in(select price from have having price<>max(price) );

select min(size) into :sminimum from have where size in(select size from have having size<>MIN(size) );

select max(size) into :smaximum from have where size in(select size from have having size<>max(size) );

select pricepct1,pricepct99,sizepct1,sizepct99 into :actual1,:actual99,:size1,:size99 from want;

quit;

data test;

merge have want;

if price=&actual1 then price=&minimum;

else if price=&actual99 then price=&maximum;

if size=&size1 then size=&sminimum;

else if size=&size99 then size=&smaximum;

run;

Thanks,

Shiva

Occasional Contributor
Posts: 17

## Re: winsorization

Hi Shiva,

Thanks for the help. It is something close to what i want. However, I do not want an output or mean yet. I just want to winsorize or eliminate those extreme observations in my variable column. There are still things that I want to do before applying proc univariate. Is there any other way to winsorize or eliminate the variables below the 1st percentile and 99th percentile? Thanks!

Clifton

Frequent Contributor
Posts: 103

## Re: winsorization

Posted in reply to rilatotoro

Shiva did winsorize the data. The test dataset that is created contains the winsorized values. For example, look at price. The lowest value of 2 was replaced by the next nearest value of 10. The highest value of 55 was replaced with the next nearest value of 45. That's winsorization.

 price price_winsorized 2 10 10 10 20 20 20 20 20 20 30 30 30 30 38 38 39 39 45 45 45 45 55 45
Occasional Contributor
Posts: 17

## Re: winsorization

Posted in reply to SAS_Bigot

Yep I did  know that the data were winsorized. But how did you get the price_winsorized column? I want it to be sort of like side by side with my original column.

Frequent Contributor
Posts: 103

## Re: winsorization

Posted in reply to rilatotoro

I just merged the datasets have and test and renamed price from test price_winsorized.

Solution
‎05-02-2012 10:06 AM
Super Contributor
Posts: 1,636

## Re: winsorization

Posted in reply to rilatotoro

Is this what you want?

data have;
input price size;
cards;
10 3
20 5
30 5
45 1
2 3
20 2
30 2
45 1
38 3
20 3
39 3
46 1
;
run;

proc univariate data=have noprint;
var price size;
output out=temp pctlpts  = 1 99 pctlpre  = price size pctlname = pct1 pct99;
run;
/* create 4 macro variables with the 4 interested values */
data _null_;
set temp;
call symputx('price1',pricepct1);
call symputx('price99',pricepct99);
call symputx('size1',sizepct1);
call symputx('size99',sizepct99);
run;
%put _user_;
data want;
set have;
where (&price1< price<&price99) and (&size1 <size< &size99);
run;
proc print;run;
Obs    price    size

1       10       3
2       20       2
3       30       2
4       38       3
5       20       3
6       39       3

Linlin

Frequent Contributor
Posts: 103

## Re: winsorization

Posted in reply to rilatotoro

Proc univariate will calculate trimmed and winsorized means. Here's the section that talks about robust estimators and explains winsorization.

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univaria...

Valued Guide
Posts: 765

## Re: winsorization

Posted in reply to rilatotoro

hi ... another idea (limited by how many variable values will fit into &PRICE and &SIZE) ...

data have;
input price size @@;
cards;
10 3 20 5 30 5 45 1 2 3 20 2 30 2 45 1 38 3 20 3 39 3 46 1
;

proc sql;
select price, size into rice separated by ',', :size separated by ',' from have;
select pctl(1,&price), pctl(99,&price), pctl(1,&size), pctl(99,&size) into 1, 99, :s1, :s99 from have (obs=1);
create table winsor as
select * from have where price gt &p1 and price lt &p99 and
size  gt &s1 and size  lt &s99;
quit;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
• 8 replies
• 4381 views
• 6 likes
• 5 in conversation