BookmarkSubscribeRSS Feed
_maldini_
Barite | Level 11

I am trying to use code I found in an old thread to winsorize variables. This code runs w/o error, but the means for the winsorized vars (wvar) are no different than the unwinsorized vars.

 

There are a few things I don't understand (Questions also embedded in the syntax).

  1. What is _n_? What role is it playing?
  2. Are these the winsorized variables (i.e. wvar1  wvar2  wvar3)?
    1. If so, the means are no different than the means of the variables prior to the "winsorization"
  3. How would I interpret this "min(max(val{_V},wlo{_V}),whi{_V});"? I'm confused by all these embedded arrays. I'm a relative beginner w/ SAS.  
  4. Are these the "min" and "max" functions here? What is the purpose of including them here?
proc univariate data=have noprint;
   var var1 var2 var3;
   output out=_testing  pctlpts=10 90  pctlpre=__var1 __var2 __var3;
run;

data want;
  set have;
  if _n_=1 then set _testing ;
* What is _n_? What role is it playing here?; 
 
 array wlo  {*} __ var1_10  __var2_10 __var2_10;
 array whi  {*} __ var1_90  __var2_90 __var2_90;  
 array wval {*} wvar1  wvar2  wvar3;
* Are these the variables that are supposed to contain the winsorized means?
array val {*} var1 var2 var3; do _V=1 to dim(val); wval{_V}=min(max(val{_V},wlo{_V}),whi{_V});
* Any help interpreting this?
* Are these the "min" and "max" functions here? What is the purpose of including them here? end; run;

Thank you very much for all your continued support. It is greatly appreciated. I would be lost w/o this forum. 

6 REPLIES 6
FreelanceReinh
Jade | Level 19

Hi @_maldini_,

 

In this brand new thread I've just pointed out that calculating Winsorized means "manually" (i.e. in a data step, possibly using a macro) can lead to different results than using PROC UNIVARIATE's WINSORIZED= option. I think this option was introduced only in SAS 9 [EDIT: no, I was wrong, sorry, it was introduced in SAS version 7], so that programs and macros for manual calculation might be obsolete, unless you need the Winsorized data (not only the means) or you insist on using a different algorithm.

 

I can take a closer look at your program tomorrow (CET) if you still want to use it.

_maldini_
Barite | Level 11

@FreelanceReinh 

 

<unless you...you insist on using a different algorithm>

 

I certainly do not. I am looking for the easiest solution to this problem. I've never used a macro in SAS however...

 

Thanks for your help!

 

 

Rick_SAS
SAS Super FREQ

@FreelanceReinh Thank you for acknowledging that a naive Winsorization, especially in the presence of missing values or repeated values,  can lead to wrong answers.  I have mentioned this in other threads, but it tends to be overlooked. 

 

I believe that the correct way to Winsorize data is given in my article "How to Winsorize Data in SAS." When testing "manual" methods, be sure to use a data set that has missing and repeated values, such as Sashelp.Heart.

Reeza
Super User

I highly recommend reading through the first link @Ksharp posted.

 

Rick_SAS
SAS Super FREQ

To the OP: Why are you wanting to Winsorize the data? What problem are you trying to solve?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2576 views
  • 2 likes
  • 5 in conversation