Programming the statistical procedures from SAS

winsorized mean and proc univariate ods

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

winsorized mean and proc univariate ods

Hi Folks

 

I'm running the below proc in order to create a unique dataset including for each provincia both the winsorized and trimmed mean ,for variable age and tenure,  with both 10% and %level

 

So I would expect a dataset result like

 

Province AgeWinsored10% AgeTrimmed10% AgeWinsored%  AgTrimmed5%

 ex. Barcelona  x1                       x2                      x3                     x4

  Madrid ec...

 

The code hower is not working how it should be

 

Any Idea about how I should correct the code ??

 

Many thanks

 

 

ods select none;
proc univariate winsor= 0.10 trim=0.1 winsor=0.05 trim=0.05 data=reduced;
var age tenure ;
by provincia;
ods output WinsorizedMeans=winsorizedmean10(keep=WinsorizedMean) TrimmedMeans=trimmedMean10 (keep=TrimmedMean) 
WinsorizedMeans=winsorizedmean5(keep=WinsorizedMean) TrimmedMeans=trimmedMean5 (keep=TrimmedMean) ;
run ;
ods trace off;

 


Accepted Solutions
Solution
‎05-23-2016 11:00 AM
Trusted Advisor
Posts: 1,115

Re: winsorized mean and proc univariate ods

[ Edited ]

I don't think this is possible directly in PROC UNIVARIATE, but, of course, you can combine the two datasets as you can combine any other datasets.

 

Example ("vertical" concatenation):

data allmeans;
set wmeans tmeans indsname=dsn;
length meantype $10;
if scan(dsn,2,'.')=:'W' then meantype='Winsorized';
else meantype='trimmed';
run;

Alternatively, you could join the datasets ("horizontally") so as to have Winsorized and trimmed means as two distinct variables.

Edit: Here's how:

proc sql;
create table allmeans2 as
select a.provincia, a.varname, round(a.halfp, 2.5) as pct,
       a.mean as wmean label='Winsorized mean',
       b.mean as tmean label='Trimmed mean'
from wmeans a, tmeans b
where a.provincia=b.provincia & a.varname=b.varname
      & round(a.halfp, 2.5)=round(b.halfp, 2.5)
order by provincia, varname, pct;
quit;

proc print data=allmeans2 noobs label;
run;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,115

Re: winsorized mean and proc univariate ods

[ Edited ]

Hi @dcortell,

 

Try this:

proc univariate winsor=0.1 0.05 trim=0.1 0.05 data=reduced;
var age tenure;
by provincia;
ods output WinsorizedMeans = wmeans(keep=provincia varname halfp mean)
           TrimmedMeans    = tmeans(keep=provincia varname halfp mean);
run;

The resulting datasets WMEANS and TMEANS will contain one observation per combination of

  • (formatted) value of PROVINCIA
  • analysis variable (age, tenure)
  • percentage (10, 5), to be identified using variable HALFP

Variable MEAN will contain the Winsorized or trimmed means, respectively.

 

Edit: Feel free to use different names for the output datasets WMEANS and TMEANS.

Contributor
Posts: 22

Re: winsorized mean and proc univariate ods

@FreelanceReinhard many thanks for the advice!! There is any way to combine the results of the wins and trims averages in a unique dataset instead that two separate ?

 

Thansk!!

 

D

Contributor
Posts: 22

Re: winsorized mean and proc univariate ods

@FreelanceReinhard many thanks for the advice!! There is any way to combine the results of the wins and trims averages in a unique dataset instead that two separate ? Thansk!! D
Solution
‎05-23-2016 11:00 AM
Trusted Advisor
Posts: 1,115

Re: winsorized mean and proc univariate ods

[ Edited ]

I don't think this is possible directly in PROC UNIVARIATE, but, of course, you can combine the two datasets as you can combine any other datasets.

 

Example ("vertical" concatenation):

data allmeans;
set wmeans tmeans indsname=dsn;
length meantype $10;
if scan(dsn,2,'.')=:'W' then meantype='Winsorized';
else meantype='trimmed';
run;

Alternatively, you could join the datasets ("horizontally") so as to have Winsorized and trimmed means as two distinct variables.

Edit: Here's how:

proc sql;
create table allmeans2 as
select a.provincia, a.varname, round(a.halfp, 2.5) as pct,
       a.mean as wmean label='Winsorized mean',
       b.mean as tmean label='Trimmed mean'
from wmeans a, tmeans b
where a.provincia=b.provincia & a.varname=b.varname
      & round(a.halfp, 2.5)=round(b.halfp, 2.5)
order by provincia, varname, pct;
quit;

proc print data=allmeans2 noobs label;
run;
Contributor
Posts: 22

Re: winsorized mean and proc univariate ods

Hi FreelanceReinhard

 

Quick question. I was bronswing in the web for material  about  the " halfp " variable you have used to catch the trim and wins Respectively trimmered/winsored percentage but seems that this " halfp" variable is something not generated automatically in the trim and wins tables

 

Would you be so gentle to explain where did you get it or share any material on that?

 

Many thanks

D

Trusted Advisor
Posts: 1,115

Re: winsorized mean and proc univariate ods

Is HALFP not contained in the ODS output datasets of your PROC UNIVARIATE step? With my SAS 9.4 it is created by default.

 

Basically, for Winsorized and trimmed means you specify a number k of observations to be Winsorized (or trimmed, respectively) at the high and the low end of the empirical distribution. When you request a Winsorized (or trimmed) mean by specifying a percentage p such as 0.1 (i.e. 10%), k "is equal to the smallest integer that is greater than or equal to np, where n is the number of observations" (documentation), i.e. it's rounded up to the next integer.

 

The actual percentage of Winsorized (trimmed) observations is 2k/n. (k/n for the high end plus k/n for the low end.) Variable HALFP contains the value k/n * 100 (hence the name), which is typically slightly larger than 100p because of the rounding mentioned above. Please note that n refers to the number of observations with non-missing values of the analysis variable (in the respective BY group, if any), hence it can vary between analysis variables.

Contributor
Posts: 22

Re: winsorized mean and proc univariate ods

 

 

Thank you for the explanation. As you can see in my ods table the halfp variable is named "Percent trimmed in tail" (at least it seems what the code you posted extract as "Halfp")

 

Maybe a matter of labels only??

 

res.png

 

Trusted Advisor
Posts: 1,115

Re: winsorized mean and proc univariate ods

Exactly. "Percent Winsorized in Tail" and "Percent Trimmed in Tail" are the labels of HALFP in the respective ODS output datasets.

Contributor
Posts: 22

Re: winsorized mean and proc univariate ods

Last question (I hope Smiley Happy )

 

Would be really helpful If you have any link to share wher eI cna find the real variable names associated to each value of the ods table (like halfp) , in order to be able to extract also the others data too

 

For what you have seen I just have labels that do not correspond to the real name of the variable

 

Thanks

Davide

SAS Super FREQ
Posts: 3,547

Re: winsorized mean and proc univariate ods

1) Use PROC CONTENTS SHORT to see names of variables.
2) If you want to also see observations, use PROC PRINT. The default table shows variable names, not labels.
3) If you bring up a table view in Windowing Environment (DMS), click View-->Column Names
Trusted Advisor
Posts: 1,115

Re: winsorized mean and proc univariate ods

Personally, I virtually always use listing output of PROC PRINT etc. to look into datasets. In PROC PRINT, for example, variable names are displayed by default and labels only on request. I don't know how you usually view datasets. If you use VIEWTABLE mode, you can switch from labels to names via the View menu (which has items "Column Labels" and "Column Names"). PROC CONTENTS (with default settings) displays both, regardless of the output destination.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 630 views
  • 7 likes
  • 3 in conversation