BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dcortell
Pyrite | Level 9

Hi Folks

 

I'm running the below proc in order to create a unique dataset including for each provincia both the winsorized and trimmed mean ,for variable age and tenure,  with both 10% and %level

 

So I would expect a dataset result like

 

Province AgeWinsored10% AgeTrimmed10% AgeWinsored%  AgTrimmed5%

 ex. Barcelona  x1                       x2                      x3                     x4

  Madrid ec...

 

The code hower is not working how it should be

 

Any Idea about how I should correct the code ??

 

Many thanks

 

 

ods select none;
proc univariate winsor= 0.10 trim=0.1 winsor=0.05 trim=0.05 data=reduced;
var age tenure ;
by provincia;
ods output WinsorizedMeans=winsorizedmean10(keep=WinsorizedMean) TrimmedMeans=trimmedMean10 (keep=TrimmedMean) 
WinsorizedMeans=winsorizedmean5(keep=WinsorizedMean) TrimmedMeans=trimmedMean5 (keep=TrimmedMean) ;
run ;
ods trace off;

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

I don't think this is possible directly in PROC UNIVARIATE, but, of course, you can combine the two datasets as you can combine any other datasets.

 

Example ("vertical" concatenation):

data allmeans;
set wmeans tmeans indsname=dsn;
length meantype $10;
if scan(dsn,2,'.')=:'W' then meantype='Winsorized';
else meantype='trimmed';
run;

Alternatively, you could join the datasets ("horizontally") so as to have Winsorized and trimmed means as two distinct variables.

Edit: Here's how:

proc sql;
create table allmeans2 as
select a.provincia, a.varname, round(a.halfp, 2.5) as pct,
       a.mean as wmean label='Winsorized mean',
       b.mean as tmean label='Trimmed mean'
from wmeans a, tmeans b
where a.provincia=b.provincia & a.varname=b.varname
      & round(a.halfp, 2.5)=round(b.halfp, 2.5)
order by provincia, varname, pct;
quit;

proc print data=allmeans2 noobs label;
run;

View solution in original post

11 REPLIES 11
FreelanceReinh
Jade | Level 19

Hi @dcortell,

 

Try this:

proc univariate winsor=0.1 0.05 trim=0.1 0.05 data=reduced;
var age tenure;
by provincia;
ods output WinsorizedMeans = wmeans(keep=provincia varname halfp mean)
           TrimmedMeans    = tmeans(keep=provincia varname halfp mean);
run;

The resulting datasets WMEANS and TMEANS will contain one observation per combination of

  • (formatted) value of PROVINCIA
  • analysis variable (age, tenure)
  • percentage (10, 5), to be identified using variable HALFP

Variable MEAN will contain the Winsorized or trimmed means, respectively.

 

Edit: Feel free to use different names for the output datasets WMEANS and TMEANS.

dcortell
Pyrite | Level 9

@FreelanceReinhard many thanks for the advice!! There is any way to combine the results of the wins and trims averages in a unique dataset instead that two separate ?

 

Thansk!!

 

D

dcortell
Pyrite | Level 9
@FreelanceReinhard many thanks for the advice!! There is any way to combine the results of the wins and trims averages in a unique dataset instead that two separate ? Thansk!! D
FreelanceReinh
Jade | Level 19

I don't think this is possible directly in PROC UNIVARIATE, but, of course, you can combine the two datasets as you can combine any other datasets.

 

Example ("vertical" concatenation):

data allmeans;
set wmeans tmeans indsname=dsn;
length meantype $10;
if scan(dsn,2,'.')=:'W' then meantype='Winsorized';
else meantype='trimmed';
run;

Alternatively, you could join the datasets ("horizontally") so as to have Winsorized and trimmed means as two distinct variables.

Edit: Here's how:

proc sql;
create table allmeans2 as
select a.provincia, a.varname, round(a.halfp, 2.5) as pct,
       a.mean as wmean label='Winsorized mean',
       b.mean as tmean label='Trimmed mean'
from wmeans a, tmeans b
where a.provincia=b.provincia & a.varname=b.varname
      & round(a.halfp, 2.5)=round(b.halfp, 2.5)
order by provincia, varname, pct;
quit;

proc print data=allmeans2 noobs label;
run;
dcortell
Pyrite | Level 9

Hi FreelanceReinhard

 

Quick question. I was bronswing in the web for material  about  the " halfp " variable you have used to catch the trim and wins Respectively trimmered/winsored percentage but seems that this " halfp" variable is something not generated automatically in the trim and wins tables

 

Would you be so gentle to explain where did you get it or share any material on that?

 

Many thanks

D

FreelanceReinh
Jade | Level 19

Is HALFP not contained in the ODS output datasets of your PROC UNIVARIATE step? With my SAS 9.4 it is created by default.

 

Basically, for Winsorized and trimmed means you specify a number k of observations to be Winsorized (or trimmed, respectively) at the high and the low end of the empirical distribution. When you request a Winsorized (or trimmed) mean by specifying a percentage p such as 0.1 (i.e. 10%), k "is equal to the smallest integer that is greater than or equal to np, where n is the number of observations" (documentation), i.e. it's rounded up to the next integer.

 

The actual percentage of Winsorized (trimmed) observations is 2k/n. (k/n for the high end plus k/n for the low end.) Variable HALFP contains the value k/n * 100 (hence the name), which is typically slightly larger than 100p because of the rounding mentioned above. Please note that n refers to the number of observations with non-missing values of the analysis variable (in the respective BY group, if any), hence it can vary between analysis variables.

dcortell
Pyrite | Level 9
 

 

Thank you for the explanation. As you can see in my ods table the halfp variable is named "Percent trimmed in tail" (at least it seems what the code you posted extract as "Halfp")

 

Maybe a matter of labels only??

 

res.png

 

FreelanceReinh
Jade | Level 19

Exactly. "Percent Winsorized in Tail" and "Percent Trimmed in Tail" are the labels of HALFP in the respective ODS output datasets.

dcortell
Pyrite | Level 9

Last question (I hope 🙂 )

 

Would be really helpful If you have any link to share wher eI cna find the real variable names associated to each value of the ods table (like halfp) , in order to be able to extract also the others data too

 

For what you have seen I just have labels that do not correspond to the real name of the variable

 

Thanks

Davide

Rick_SAS
SAS Super FREQ
1) Use PROC CONTENTS SHORT to see names of variables.
2) If you want to also see observations, use PROC PRINT. The default table shows variable names, not labels.
3) If you bring up a table view in Windowing Environment (DMS), click View-->Column Names
FreelanceReinh
Jade | Level 19

Personally, I virtually always use listing output of PROC PRINT etc. to look into datasets. In PROC PRINT, for example, variable names are displayed by default and labels only on request. I don't know how you usually view datasets. If you use VIEWTABLE mode, you can switch from labels to names via the View menu (which has items "Column Labels" and "Column Names"). PROC CONTENTS (with default settings) displays both, regardless of the output destination.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 4499 views
  • 7 likes
  • 3 in conversation