BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rilatotoro
Calcite | Level 5

Hi does anyone know how to winsorize variables in a data? Let say you have this variable called size and prices in your data and you want them to be 'winsorized' at the 1st and 99th percentile? thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10

Is this what you want?

data have;
input price size;
cards;
10 3
20 5
30 5
45 1
2 3
20 2
30 2
45 1
38 3
20 3
39 3
46 1
;
run;

proc univariate data=have noprint;
   var price size;
   output out=temp pctlpts  = 1 99 pctlpre  = price size pctlname = pct1 pct99;
run;
/* create 4 macro variables with the 4 interested values */
data _null_;
  set temp;
  call symputx('price1',pricepct1);
  call symputx('price99',pricepct99);
  call symputx('size1',sizepct1);
  call symputx('size99',sizepct99);
run;
%put _user_;
data want;
   set have;
   where (&price1< price<&price99) and (&size1 <size< &size99);
run;
proc print;run;
                                        Obs    price    size

                                         1       10       3
                                         2       20       2
                                         3       30       2
                                         4       38       3
                                         5       20       3
                                         6       39       3

Linlin

View solution in original post

8 REPLIES 8
shivas
Pyrite | Level 9

Hi,

Is this the output you required...

data have;

input price size;

cards;

10 3

20 5

30 5

45 1

2 3

20 5

30 5

45 1

38 3

20 5

39 5

55 1

;

run;

proc univariate data=have noprint;

   var price size;

   output out=want mean=mean std=std pctlpts  = 1 99 pctlpre  = price size

                                       pctlname = pct1 pct99;

run;

proc sql;

select min(price) into :minimum from have where price in(select price from have having price<>MIN(price) );

select max(price) into :maximum from have where price in(select price from have having price<>max(price) );

select min(size) into :sminimum from have where size in(select size from have having size<>MIN(size) );

select max(size) into :smaximum from have where size in(select size from have having size<>max(size) );

select pricepct1,pricepct99,sizepct1,sizepct99 into :actual1,:actual99,:size1,:size99 from want;

quit;

data test;

merge have want;

if price=&actual1 then price=&minimum;

else if price=&actual99 then price=&maximum;

if size=&size1 then size=&sminimum;

else if size=&size99 then size=&smaximum;

run;

Thanks,

Shiva

rilatotoro
Calcite | Level 5

Hi Shiva,

Thanks for the help. It is something close to what i want. However, I do not want an output or mean yet. I just want to winsorize or eliminate those extreme observations in my variable column. There are still things that I want to do before applying proc univariate. Is there any other way to winsorize or eliminate the variables below the 1st percentile and 99th percentile? Thanks!

Clifton

FloydNevseta
Pyrite | Level 9

Shiva did winsorize the data. The test dataset that is created contains the winsorized values. For example, look at price. The lowest value of 2 was replaced by the next nearest value of 10. The highest value of 55 was replaced with the next nearest value of 45. That's winsorization.

priceprice_winsorized
210
1010
2020
2020
2020
3030
3030
3838
3939
4545
4545
5545
rilatotoro
Calcite | Level 5

Yep I did  know that the data were winsorized. But how did you get the price_winsorized column? I want it to be sort of like side by side with my original column.

FloydNevseta
Pyrite | Level 9

I just merged the datasets have and test and renamed price from test price_winsorized.

Linlin
Lapis Lazuli | Level 10

Is this what you want?

data have;
input price size;
cards;
10 3
20 5
30 5
45 1
2 3
20 2
30 2
45 1
38 3
20 3
39 3
46 1
;
run;

proc univariate data=have noprint;
   var price size;
   output out=temp pctlpts  = 1 99 pctlpre  = price size pctlname = pct1 pct99;
run;
/* create 4 macro variables with the 4 interested values */
data _null_;
  set temp;
  call symputx('price1',pricepct1);
  call symputx('price99',pricepct99);
  call symputx('size1',sizepct1);
  call symputx('size99',sizepct99);
run;
%put _user_;
data want;
   set have;
   where (&price1< price<&price99) and (&size1 <size< &size99);
run;
proc print;run;
                                        Obs    price    size

                                         1       10       3
                                         2       20       2
                                         3       30       2
                                         4       38       3
                                         5       20       3
                                         6       39       3

Linlin

FloydNevseta
Pyrite | Level 9

Proc univariate will calculate trimmed and winsorized means. Here's the section that talks about robust estimators and explains winsorization.

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_univaria...

MikeZdeb
Rhodochrosite | Level 12

hi ... another idea (limited by how many variable values will fit into &PRICE and &SIZE) ...

data have;
input price size @@;
cards;
10 3 20 5 30 5 45 1 2 3 20 2 30 2 45 1 38 3 20 3 39 3 46 1
;

proc sql;
select price, size into :price separated by ',', :size separated by ',' from have;
select pctl(1,&price), pctl(99,&price), pctl(1,&size), pctl(99,&size) into :p1, :p99, :s1, :s99 from have (obs=1);
create table winsor as
select * from have where price gt &p1 and price lt &p99 and
                         size  gt &s1 and size  lt &s99;
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 5607 views
  • 7 likes
  • 5 in conversation