Help using Base SAS procedures

how to winsorize variable using SAS

Accepted Solution Solved
Reply
Contributor
Posts: 38
Accepted Solution

how to winsorize variable using SAS

Hi,

I did know how to winsorize in Stata, but how to do it in SAS.

For example, I have three variables: size, sales, total assets

I want winsorize at the 1% and 99%, i.e. values that are less than the value at 1% are replaced by the value at 1%, and values that are greater than the value at 99% are replaced by the value at 99%. I don't want obs. with outliers are deleted.

Thanks.



Accepted Solutions
Solution
‎08-15-2012 08:58 PM
Valued Guide
Posts: 797

Re: how to winsorize variable using SAS

[ Edited ]

That's because in this small sample the 99%ile and 1%ile are at the max and min of the variables, so the winsorizing values were already at the extremes.  Add more observations or change the percentiles (but if you change the percentiles, modify the names of the variables in the WLO and WHI arrays correspondingly.

 

Also you mistakenly generated the 10%ile instead of the first pecentile in the univariate procedure but that had no effect because the data step was looking for __size1, (but the actual variable created was __size10).

 

I will change to the 10th and 90th percentiles which should produce results in your example.

 

Editor's note: including original code solution in this message for convenience:

 

Two steps:

 1. find the winsor low and high percentiles (PROC UNIVARIATE)

 2. truncate outliers to those values (DATA step)

 

%let L=10;    %* 10th percentile *;
%let H=%eval(100 - &L);   %* 90th percentile*;
proc univariate data=have noprint;
   var size sales assets;
   output out=_winsor   pctlpts=&L  &H    
   pctlpre=__size  __sales  __assets;
run;
data want (drop=__:);
  set have;
  if _n_=1 then set _winsor;
  array wlo  {*} __size&L  __sales&L   __assets&L;
  array whi  {*} __size&H __sales&H __assets&H;
  array wval {*} wsize wsales wassets;
  array val   {*} size sales assets;
  do _V=1 to dim(val);
     wval{_V}=min(max(val{_V},wlo{_V}),whi{_V});
  end;
run;

View solution in original post


All Replies
Valued Guide
Posts: 797

Re: how to winsorize variable using SAS

lTwo steps:

   1. find the winsor low and high percentiles

   2. truncate outliers to those values

%let L=10;    %* 10th percentile *;

%let H=%eval(100 - &L);   %* 90th percentile*;

proc univariate data=have noprint;

   var size sales assets;

   output out=_winsor   pctlpts=&L  &H     pctlpre=__size  __sales  __assets;

run;

data want (drop=_Smiley Happy;

  set have;

  if _n_=1 then set _winsor;

  array wlo  {*} __size&L  __sales&L   __assets&L;

  array whi  {*} __size&H __sales&H __assets&H;

  array wval {*} wsize wsales wassets;

  array val   {*} size sales assets;

  do _V=1 to dim(val);

     wval{_V}=min(max(val{_V},wlo{_V}),whi{_V});

  end;

run;

 

This can be easily macro-ized so that you can make parameters of the variable list, input/output data sets, and percentiles.  I suspect you may be a user of WRDS, given you are dealing with company-based financial data.  If so, you can find a more generalized winsorize macro via the RESEARCH tab on the WRDS web site.

Contributor
Posts: 38

Re: how to winsorize variable using SAS

Hi mkeintz,

Problem again.

It seems that the winsor procedure does not make difference since values in want dataset(after winsorizing) is the same as have dataset(before winsorizing).

For example:

data have;

input size sales assets;

cards;

10 3 15

20 5 20

30 5 23.2

45 19 14.3

2 3 16.4

20 2 170

30 2 3

45 1 5

100 3 10

20 3 25

39 3 30

46 1 12

;

run;

proc univariate data=have noprint;

   var size sales assets;

   output out=_winsor   pctlpts=10 99     pctlpre=__size  __sales  __assets;

run;

data want (drop=_Smiley Happy;

  set have;

  if _n_=1 then set _winsor;

  array wlo  {*} __size1 __sales1 __assets1;

  array whi  {*} __size99 __sales99 __assets99;

  array wval {*} wsize wsales wassets;

  array val   {*} size sales assets;

  do _V=1 to dim(val);

     wval{_V}=min(max(val{_V},wlo{_V}),whi{_V});

  end;

run;

Solution
‎08-15-2012 08:58 PM
Valued Guide
Posts: 797

Re: how to winsorize variable using SAS

[ Edited ]

That's because in this small sample the 99%ile and 1%ile are at the max and min of the variables, so the winsorizing values were already at the extremes.  Add more observations or change the percentiles (but if you change the percentiles, modify the names of the variables in the WLO and WHI arrays correspondingly.

 

Also you mistakenly generated the 10%ile instead of the first pecentile in the univariate procedure but that had no effect because the data step was looking for __size1, (but the actual variable created was __size10).

 

I will change to the 10th and 90th percentiles which should produce results in your example.

 

Editor's note: including original code solution in this message for convenience:

 

Two steps:

 1. find the winsor low and high percentiles (PROC UNIVARIATE)

 2. truncate outliers to those values (DATA step)

 

%let L=10;    %* 10th percentile *;
%let H=%eval(100 - &L);   %* 90th percentile*;
proc univariate data=have noprint;
   var size sales assets;
   output out=_winsor   pctlpts=&L  &H    
   pctlpre=__size  __sales  __assets;
run;
data want (drop=__:);
  set have;
  if _n_=1 then set _winsor;
  array wlo  {*} __size&L  __sales&L   __assets&L;
  array whi  {*} __size&H __sales&H __assets&H;
  array wval {*} wsize wsales wassets;
  array val   {*} size sales assets;
  do _V=1 to dim(val);
     wval{_V}=min(max(val{_V},wlo{_V}),whi{_V});
  end;
run;
Contributor
Posts: 38

Re: how to winsorize variable using SAS

It's my fault. Sorry for that.

Thanks anyway.

Frequent Contributor
Posts: 101

Re: how to winsorize variable using SAS

Hello,

I found this answer very helpful, but I have a question for how to make it a little more specific to my needs.

I have data that I need to winsorize by quarter.  So, I have a large dataset with lots of values for each quarter, and I want to winsorize the top and bottom percent within each time period.

How could I modify the above code to make that happen?

Thanks!

John

Contributor
Posts: 57

Re: how to winsorize variable using SAS

Hello Folks!

I have also a similar problem here! I have one continuous variable and it was not normally distributed - So I tried to take the square root of the variable and then winsorize at 5% and 95% - It was not normal again! So I went to winsorize at %7 and %97 - Then the variable was normally distributed!

But, all the original significant associations from PROC GLM disappeared! can you guys please suggest me any solution for this?

Respected Advisor
Posts: 4,641

Re: how to winsorize variable using SAS

Your question appears as an ANSWERED question on the Forum, which it is not. Post it as a new question (Discussion) on the SAS Statistical Procedures Community to attract more attention. - PG

PG
Contributor
Posts: 57

Re: how to winsorize variable using SAS

Sure! Thank you!

Respected Advisor
Posts: 3,124

Re: how to winsorize variable using SAS

A Google search brings me the following links, you may find them relevant:

Google Groups

http://www.wrds.us/index.php/repository/view/1

Haikuo

Super User
Posts: 9,671

Re: how to winsorize variable using SAS

Are you try to calculate Winsorized mean ? It is an example from documentation.

title 'Robust Estimates for Blood Pressure Data';
ods select TrimmedMeans WinsorizedMeans RobustScale;
proc univariate data=BPressure trimmed=1 .1
winsorized=.1 robustscale;
var Systolic;
run;

Ksharp

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 11497 views
  • 7 likes
  • 7 in conversation