Winsorize macro

Accepted Solution Solved
Reply
Contributor
Posts: 23
Accepted Solution

Winsorize macro

I cannot get my Winsorize macro to work! I have tried multiple macros (to no avail), but this one is from the most reliable source.  I have figured it out roughly using the proc univariate command, but I have a number of variables to winsorize so a macro will be much more efficient.

 

Thank you in advance for your input.

 

 

%MACRO WINSORIZE (INSET=have,OUTSET=want,SORTVAR=quarter,VARS=capxy xrdy,PERC1=1,TRIM=0);

/* List of all variables */
%let vars = %sysfunc(compbl(&vars));
%let nvars = %nwords(&vars);

/* Display Output */
%put ### START.;

/* Trimming / Winsorization Options */
%if &trim=0 %then %put ### Winsorization; %else %put ### Trimming;
%put ### Number of Variables: &nvars;
%put ### List of Variables: &vars;
options nonotes;

/* Ranking within &sortvar levels */
%put ### Sorting... ;
proc sort data=&inset; by &sortvar; run;

/* 2-tail winsorization/trimming */
%let perc2 = %eval(100-&perc1);

%let var2 = %sysfunc(tranwrd(&vars,%str( ),%str(__ )))__;
%let var_p1 = %sysfunc(tranwrd(&vars,%str( ),%str(__&perc1 )))__&perc1 ;
%let var_p2 = %sysfunc(tranwrd(&vars,%str( ),%str(__&perc2 )))__&perc2 ;

/* Calculate upper and lower percentiles */
proc univariate data=&inset noprint;
by &sortvar;
var &vars;
output out=_perc pctlpts=&perc1 &perc2 pctlpre=&var2;
run;

%if &trim=1 %then
%let condition = %str(if myvars(i)>=perct2(i) or myvars(i)<=perct1(i) then myvars(i)=. );
%else %let condition = %str(myvars(i)=min(perct2(i),max(perct1(i),myvars(i))) );

%if &trim=0 %then %put ### Winsorizing at &perc1.%... ;
%else %put ### Trimming at &perc1.%... ;

/* Save output with trimmed/winsorized variables */
data &outset;
merge &inset (in=a) _perc;
by &sortvar;
if a;
array myvars {&nvars} &vars;
array perct1 {&nvars} &var_p1;
array perct2 {&nvars} &var_p2;
do i = 1 to &nvars;
if not missing(myvars(i)) then
do;
&condition;
end;
end;
drop i &var_p1 &var_p2;
run;

/* House Cleaning */
proc sql; drop table _perc; quit;
options notes;

%put ### DONE . ; %put ;

%MEND WINSORIZE;

 

 


Accepted Solutions
Solution
‎03-05-2016 09:04 AM
Trusted Advisor
Posts: 1,116

Re: Winsorize macro

Aha! :-)

 

Well, since you entered your parameters already in the macro definition (as default values), you can run the macro with these default parameter values simply by submitting

%WINSORIZE;

To run it with parameter values different from the default values you specify the parameter names and values in the same way as they are written in the macro definition.

 

Example:

%winsorize(inset=have, sortvar=quarter);

This would use dataset HAVE as INSET and variable QUARTER as SORTVAR. All other parameters would use their defaults.

View solution in original post


All Replies
Super User
Posts: 11,141

Re: Winsorize macro

Describe what you mean by "doesn't work".

We don't have your data so are limited on testing.

If you get error messages, use OPTION MPRINT SYMBOLGEN; , run the code with the data and post the log with the error messages.

If you get no output or unexpected output we need some data and if unexpected results some details of the expected.

 

Also you apparently have at least on helper macro %nwords that we do not have, so cannot tell if it might be related to the issue.

Trusted Advisor
Posts: 1,116

Re: Winsorize macro

The results I obtained with test data and default settings of the macro after replacing %nwords(&vars) by 2 for the moment, are promising. So, you might just have to compile (or first write) macro NWORDS. But I could imagine that something like 

%let nvars = %sysfunc(countw(&vars));

could work as well. At least it does so with default settings.

Contributor
Posts: 23

Re: Winsorize macro

[ Edited ]

@FreelanceReinhard, I tried that code, but I still get no output. Woman Frustrated

 

@ballardw, I get no output at all. I see what you are saying about the nwords macro- I did not run that before.  I have it now, but I am not entirely sure how to use it.  Do I put my variables where it says "invar"?  I attached a small subsample of the data I am working with. I attached an excel file because I was having trouble attaching a sas file. Please let me know if it be easier for me to just type an example! The code (nwords macro and winsorizing macro) is as follows.

 

(Note: to begin, I am just trying to windsorize capxy and/or xrdy variables for each quarter. Eventually, I have a number of variables I intend to winsorize and will adjust the macro once I get it up and running.)

 

%MACRO NWORDS (INVAR);
%local N W;
/* %let invar = %sysfunc(compbl(&invar)); */
%let N = 0;
%let W = 1;
%do %while (%nrquote(%scan(&invar,&W,%str( ))) ^= %str());
%let N = %eval(&N+1);
%let W = %eval(&W+1);
%end;
&N
%MEND NWORDS;

 

 

%MACRO WINSORIZE (INSET=data.sashelp,OUTSET=winsor,SORTVAR=,VARS=capxy xrdy,PERC1=1,TRIM=0);

/* List of all variables */
%let vars = %sysfunc(compbl(&vars));
%let nvars = %nwords(&vars);


/* Display Output */
%put ### START.;

/* Trimming / Winsorization Options */
%if &trim=0 %then %put ### Winsorization; %else %put ### Trimming;
%put ### Number of Variables: &nvars;
%put ### List of Variables: &vars;
options nonotes;

/* Ranking within &sortvar levels */
%put ### Sorting... ;
proc sort data=&inset; by &sortvar; run;

/* 2-tail winsorization/trimming */
%let perc2 = %eval(100-&perc1);

%let var2 = %sysfunc(tranwrd(&vars,%str( ),%str(__ )))__;
%let var_p1 = %sysfunc(tranwrd(&vars,%str( ),%str(__&perc1 )))__&perc1 ;
%let var_p2 = %sysfunc(tranwrd(&vars,%str( ),%str(__&perc2 )))__&perc2 ;

/* Calculate upper and lower percentiles */
proc univariate data=&inset noprint;
by &sortvar;
var &vars;
output out=_perc pctlpts=&perc1 &perc2 pctlpre=&var2;
run;

%if &trim=1 %then
%let condition = %str(if myvars(i)>=perct2(i) or myvars(i)<=perct1(i) then myvars(i)=. );
%else %let condition = %str(myvars(i)=min(perct2(i),max(perct1(i),myvars(i))) );

%if &trim=0 %then %put ### Winsorizing at &perc1.%... ;
%else %put ### Trimming at &perc1.%... ;

/* Save output with trimmed/winsorized variables */
data &outset;
merge &inset (in=a) _perc;
by &sortvar;
if a;
array myvars {&nvars} &vars;
array perct1 {&nvars} &var_p1;
array perct2 {&nvars} &var_p2;
do i = 1 to &nvars;
if not missing(myvars(i)) then
do;
&condition;
end;
end;
drop i &var_p1 &var_p2;
run;

/* House Cleaning */
proc sql; drop table _perc; quit;
options notes;

%put ### DONE . ; %put ;
%MEND WINSORIZE;

 

 

Trusted Advisor
Posts: 1,116

Re: Winsorize macro

@klndsy: How strange that you didn't get any output, but I did. It would be interesting to see the log of your macro call after you've (temporarily) deleted or commented out the following line in the macro.

options nonotes;

As to macro NWORDS: It should work without any changes. Just compile it (i.e. submit the macro code) before you call macro WINSORIZE.

Contributor
Posts: 23

Re: Winsorize macro

@FreelanceReinhard Still nothing!!! I tried taking out the "options nonotes;" as well. Just to be clear, I am looking for a file named "winsor"- correct?  That is what I named for my outfile to be at the beginning of "Windsorize." 

 

My log never shows that the macro is complete, either. It is no longer running, but I do not get any confirmation that it was complete.  

 

Trusted Advisor
Posts: 1,116

Re: Winsorize macro

@klndsy: Don't worry. I'm pretty sure it's only a very minor issue, because the macro seemed to produce reasonable results for me. I've just run it again successfully, now including macro NWORDS.

 

You say that your "log never shows that the macro is complete." But you see the various %PUT statements in the macro code, such as

%put ### DONE . ;

at the end of the macro?

 

So, if not even these messages were written to the log (which is independent of option setting NOTES/NONOTES), it would seem to me that the code was not executed at all. Do you get any messages in the log if you run an arbitrary data or proc step or simply the above %PUT statement?

 

 

Contributor
Posts: 23

Re: Winsorize macro

[ Edited ]

I do see the "%put ### done.;" in the code.  

 

I copied a screenshot of my log when I run the macro.  It just reads the code, but it doesn't seem to be doing anything.  I just did a proc means to test if any messages are logging, and that logged just fine.

 

sashelp.jpg

Trusted Advisor
Posts: 1,116

Re: Winsorize macro

Your log doesn't show a macro call, but just the macro definition.

 

How did the macro call look like?

 

Btw, I've just run the macro with your data. Now I'm doing some checks on the results.

Contributor
Posts: 23

Re: Winsorize macro

@FreelanceReinhard oh my goodness; I don't think I called it. Ha! Quite obviously, I've never used a macro before. How do I do that? 

Solution
‎03-05-2016 09:04 AM
Trusted Advisor
Posts: 1,116

Re: Winsorize macro

Aha! :-)

 

Well, since you entered your parameters already in the macro definition (as default values), you can run the macro with these default parameter values simply by submitting

%WINSORIZE;

To run it with parameter values different from the default values you specify the parameter names and values in the same way as they are written in the macro definition.

 

Example:

%winsorize(inset=have, sortvar=quarter);

This would use dataset HAVE as INSET and variable QUARTER as SORTVAR. All other parameters would use their defaults.

Contributor
Posts: 23

Re: Winsorize macro

@FreelanceReinhard I can't believe it was that easy/silly this whole time.  Ha!!! Thank you so much for your help!

Super User
Posts: 11,141

Re: Winsorize macro

If providing data is best to provide a data step like this:

data have;
   input x y;
datalines;
123 456
23  789
45559 111.04
;

Excel has too many opportunities for your variables to be a different type if we have to 1)download a file 2) run proc import or other code.

 

The SAS datasets have potential issues from different operating systems, encoding, "bitness" and such.

Contributor
Posts: 23

Re: Winsorize macro

[ Edited ]

sure! my code is set up similar to this:

 

data have;
input firm quarter capxy rdxy;
datalines;

1 2003.1 0.023 0.053
2 2003.1 0.679 3.877
3 2003.1 0.636 0.000
4 2003.1 5.453 1.923
1 2003.2 0.543 0.043
2 2003.2 0.921 1.924
3 2003.2 0.843 0.000
4 2003.2 0.982 1.482

 

 

Trusted Advisor
Posts: 1,116

Re: Winsorize macro

Just to let you know: The Winsorized means obtained from the macro's output dataset will typically be slightly different from the corresponding Winsorized means obtained from PROC UNIVARIATE. This is due to different definitions of the relevant percentiles.

 

Example (based on your big dataset with 17427 observations):

For CAPXY in quarter 2003.1 your macro uses the 99% quantile, 550, as the replacement for values beyond this threshold. PROC UNIVARIATE, however, uses the 21st largest value, 537, as the threshold. The Winsorized means are approx. 22.86 vs. 22.73, respectively.

 

I have to call it a day now (or rather a night, here in Central Europe) and will be back tomorrow.

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 14 replies
  • 1157 views
  • 0 likes
  • 3 in conversation