Hi does anyone know how to winsorize variables in a data? Let say you have this variable called size and prices in your data and you want them to be 'winsorized' at the 1st and 99th percentile? thanks!
Is this what you want?
data have;
input price size;
cards;
10 3
20 5
30 5
45 1
2 3
20 2
30 2
45 1
38 3
20 3
39 3
46 1
;
run;
proc univariate data=have noprint;
var price size;
output out=temp pctlpts = 1 99 pctlpre = price size pctlname = pct1 pct99;
run;
/* create 4 macro variables with the 4 interested values */
data _null_;
set temp;
call symputx('price1',pricepct1);
call symputx('price99',pricepct99);
call symputx('size1',sizepct1);
call symputx('size99',sizepct99);
run;
%put _user_;
data want;
set have;
where (&price1< price<&price99) and (&size1 <size< &size99);
run;
proc print;run;
Obs price size
1 10 3
2 20 2
3 30 2
4 38 3
5 20 3
6 39 3
Linlin
Hi,
Is this the output you required...
data have;
input price size;
cards;
10 3
20 5
30 5
45 1
2 3
20 5
30 5
45 1
38 3
20 5
39 5
55 1
;
run;
proc univariate data=have noprint;
var price size;
output out=want mean=mean std=std pctlpts = 1 99 pctlpre = price size
pctlname = pct1 pct99;
run;
proc sql;
select min(price) into :minimum from have where price in(select price from have having price<>MIN(price) );
select max(price) into :maximum from have where price in(select price from have having price<>max(price) );
select min(size) into :sminimum from have where size in(select size from have having size<>MIN(size) );
select max(size) into :smaximum from have where size in(select size from have having size<>max(size) );
select pricepct1,pricepct99,sizepct1,sizepct99 into :actual1,:actual99,:size1,:size99 from want;
quit;
data test;
merge have want;
if price=&actual1 then price=&minimum;
else if price=&actual99 then price=&maximum;
if size=&size1 then size=&sminimum;
else if size=&size99 then size=&smaximum;
run;
Thanks,
Shiva
Hi Shiva,
Thanks for the help. It is something close to what i want. However, I do not want an output or mean yet. I just want to winsorize or eliminate those extreme observations in my variable column. There are still things that I want to do before applying proc univariate. Is there any other way to winsorize or eliminate the variables below the 1st percentile and 99th percentile? Thanks!
Clifton
Shiva did winsorize the data. The test dataset that is created contains the winsorized values. For example, look at price. The lowest value of 2 was replaced by the next nearest value of 10. The highest value of 55 was replaced with the next nearest value of 45. That's winsorization.
price | price_winsorized |
2 | 10 |
10 | 10 |
20 | 20 |
20 | 20 |
20 | 20 |
30 | 30 |
30 | 30 |
38 | 38 |
39 | 39 |
45 | 45 |
45 | 45 |
55 | 45 |
Yep I did know that the data were winsorized. But how did you get the price_winsorized column? I want it to be sort of like side by side with my original column.
I just merged the datasets have and test and renamed price from test price_winsorized.
Is this what you want?
data have;
input price size;
cards;
10 3
20 5
30 5
45 1
2 3
20 2
30 2
45 1
38 3
20 3
39 3
46 1
;
run;
proc univariate data=have noprint;
var price size;
output out=temp pctlpts = 1 99 pctlpre = price size pctlname = pct1 pct99;
run;
/* create 4 macro variables with the 4 interested values */
data _null_;
set temp;
call symputx('price1',pricepct1);
call symputx('price99',pricepct99);
call symputx('size1',sizepct1);
call symputx('size99',sizepct99);
run;
%put _user_;
data want;
set have;
where (&price1< price<&price99) and (&size1 <size< &size99);
run;
proc print;run;
Obs price size
1 10 3
2 20 2
3 30 2
4 38 3
5 20 3
6 39 3
Linlin
Proc univariate will calculate trimmed and winsorized means. Here's the section that talks about robust estimators and explains winsorization.
hi ... another idea (limited by how many variable values will fit into &PRICE and &SIZE) ...
data have;
input price size @@;
cards;
10 3 20 5 30 5 45 1 2 3 20 2 30 2 45 1 38 3 20 3 39 3 46 1
;
proc sql;
select price, size into :price separated by ',', :size separated by ',' from have;
select pctl(1,&price), pctl(99,&price), pctl(1,&size), pctl(99,&size) into :p1, :p99, :s1, :s99 from have (obs=1);
create table winsor as
select * from have where price gt &p1 and price lt &p99 and
size gt &s1 and size lt &s99;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.