BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
klndsy
Calcite | Level 5

I cannot get my Winsorize macro to work! I have tried multiple macros (to no avail), but this one is from the most reliable source.  I have figured it out roughly using the proc univariate command, but I have a number of variables to winsorize so a macro will be much more efficient.

 

Thank you in advance for your input.

 

 

%MACRO WINSORIZE (INSET=have,OUTSET=want,SORTVAR=quarter,VARS=capxy xrdy,PERC1=1,TRIM=0);

/* List of all variables */
%let vars = %sysfunc(compbl(&vars));
%let nvars = %nwords(&vars);

/* Display Output */
%put ### START.;

/* Trimming / Winsorization Options */
%if &trim=0 %then %put ### Winsorization; %else %put ### Trimming;
%put ### Number of Variables: &nvars;
%put ### List of Variables: &vars;
options nonotes;

/* Ranking within &sortvar levels */
%put ### Sorting... ;
proc sort data=&inset; by &sortvar; run;

/* 2-tail winsorization/trimming */
%let perc2 = %eval(100-&perc1);

%let var2 = %sysfunc(tranwrd(&vars,%str( ),%str(__ )))__;
%let var_p1 = %sysfunc(tranwrd(&vars,%str( ),%str(__&perc1 )))__&perc1 ;
%let var_p2 = %sysfunc(tranwrd(&vars,%str( ),%str(__&perc2 )))__&perc2 ;

/* Calculate upper and lower percentiles */
proc univariate data=&inset noprint;
by &sortvar;
var &vars;
output out=_perc pctlpts=&perc1 &perc2 pctlpre=&var2;
run;

%if &trim=1 %then
%let condition = %str(if myvars(i)>=perct2(i) or myvars(i)<=perct1(i) then myvars(i)=. );
%else %let condition = %str(myvars(i)=min(perct2(i),max(perct1(i),myvars(i))) );

%if &trim=0 %then %put ### Winsorizing at &perc1.%... ;
%else %put ### Trimming at &perc1.%... ;

/* Save output with trimmed/winsorized variables */
data &outset;
merge &inset (in=a) _perc;
by &sortvar;
if a;
array myvars {&nvars} &vars;
array perct1 {&nvars} &var_p1;
array perct2 {&nvars} &var_p2;
do i = 1 to &nvars;
if not missing(myvars(i)) then
do;
&condition;
end;
end;
drop i &var_p1 &var_p2;
run;

/* House Cleaning */
proc sql; drop table _perc; quit;
options notes;

%put ### DONE . ; %put ;

%MEND WINSORIZE;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Aha! 🙂

 

Well, since you entered your parameters already in the macro definition (as default values), you can run the macro with these default parameter values simply by submitting

%WINSORIZE;

To run it with parameter values different from the default values you specify the parameter names and values in the same way as they are written in the macro definition.

 

Example:

%winsorize(inset=have, sortvar=quarter);

This would use dataset HAVE as INSET and variable QUARTER as SORTVAR. All other parameters would use their defaults.

View solution in original post

14 REPLIES 14
ballardw
Super User

Describe what you mean by "doesn't work".

We don't have your data so are limited on testing.

If you get error messages, use OPTION MPRINT SYMBOLGEN; , run the code with the data and post the log with the error messages.

If you get no output or unexpected output we need some data and if unexpected results some details of the expected.

 

Also you apparently have at least on helper macro %nwords that we do not have, so cannot tell if it might be related to the issue.

FreelanceReinh
Jade | Level 19

The results I obtained with test data and default settings of the macro after replacing %nwords(&vars) by 2 for the moment, are promising. So, you might just have to compile (or first write) macro NWORDS. But I could imagine that something like 

%let nvars = %sysfunc(countw(&vars));

could work as well. At least it does so with default settings.

klndsy
Calcite | Level 5

@FreelanceReinh, I tried that code, but I still get no output. Woman Frustrated

 

@ballardw, I get no output at all. I see what you are saying about the nwords macro- I did not run that before.  I have it now, but I am not entirely sure how to use it.  Do I put my variables where it says "invar"?  I attached a small subsample of the data I am working with. I attached an excel file because I was having trouble attaching a sas file. Please let me know if it be easier for me to just type an example! The code (nwords macro and winsorizing macro) is as follows.

 

(Note: to begin, I am just trying to windsorize capxy and/or xrdy variables for each quarter. Eventually, I have a number of variables I intend to winsorize and will adjust the macro once I get it up and running.)

 

%MACRO NWORDS (INVAR);
%local N W;
/* %let invar = %sysfunc(compbl(&invar)); */
%let N = 0;
%let W = 1;
%do %while (%nrquote(%scan(&invar,&W,%str( ))) ^= %str());
%let N = %eval(&N+1);
%let W = %eval(&W+1);
%end;
&N
%MEND NWORDS;

 

 

%MACRO WINSORIZE (INSET=data.sashelp,OUTSET=winsor,SORTVAR=,VARS=capxy xrdy,PERC1=1,TRIM=0);

/* List of all variables */
%let vars = %sysfunc(compbl(&vars));
%let nvars = %nwords(&vars);


/* Display Output */
%put ### START.;

/* Trimming / Winsorization Options */
%if &trim=0 %then %put ### Winsorization; %else %put ### Trimming;
%put ### Number of Variables: &nvars;
%put ### List of Variables: &vars;
options nonotes;

/* Ranking within &sortvar levels */
%put ### Sorting... ;
proc sort data=&inset; by &sortvar; run;

/* 2-tail winsorization/trimming */
%let perc2 = %eval(100-&perc1);

%let var2 = %sysfunc(tranwrd(&vars,%str( ),%str(__ )))__;
%let var_p1 = %sysfunc(tranwrd(&vars,%str( ),%str(__&perc1 )))__&perc1 ;
%let var_p2 = %sysfunc(tranwrd(&vars,%str( ),%str(__&perc2 )))__&perc2 ;

/* Calculate upper and lower percentiles */
proc univariate data=&inset noprint;
by &sortvar;
var &vars;
output out=_perc pctlpts=&perc1 &perc2 pctlpre=&var2;
run;

%if &trim=1 %then
%let condition = %str(if myvars(i)>=perct2(i) or myvars(i)<=perct1(i) then myvars(i)=. );
%else %let condition = %str(myvars(i)=min(perct2(i),max(perct1(i),myvars(i))) );

%if &trim=0 %then %put ### Winsorizing at &perc1.%... ;
%else %put ### Trimming at &perc1.%... ;

/* Save output with trimmed/winsorized variables */
data &outset;
merge &inset (in=a) _perc;
by &sortvar;
if a;
array myvars {&nvars} &vars;
array perct1 {&nvars} &var_p1;
array perct2 {&nvars} &var_p2;
do i = 1 to &nvars;
if not missing(myvars(i)) then
do;
&condition;
end;
end;
drop i &var_p1 &var_p2;
run;

/* House Cleaning */
proc sql; drop table _perc; quit;
options notes;

%put ### DONE . ; %put ;
%MEND WINSORIZE;

 

 

FreelanceReinh
Jade | Level 19

@klndsy: How strange that you didn't get any output, but I did. It would be interesting to see the log of your macro call after you've (temporarily) deleted or commented out the following line in the macro.

options nonotes;

As to macro NWORDS: It should work without any changes. Just compile it (i.e. submit the macro code) before you call macro WINSORIZE.

klndsy
Calcite | Level 5

@FreelanceReinh Still nothing!!! I tried taking out the "options nonotes;" as well. Just to be clear, I am looking for a file named "winsor"- correct?  That is what I named for my outfile to be at the beginning of "Windsorize." 

 

My log never shows that the macro is complete, either. It is no longer running, but I do not get any confirmation that it was complete.  

 

FreelanceReinh
Jade | Level 19

@klndsy: Don't worry. I'm pretty sure it's only a very minor issue, because the macro seemed to produce reasonable results for me. I've just run it again successfully, now including macro NWORDS.

 

You say that your "log never shows that the macro is complete." But you see the various %PUT statements in the macro code, such as

%put ### DONE . ;

at the end of the macro?

 

So, if not even these messages were written to the log (which is independent of option setting NOTES/NONOTES), it would seem to me that the code was not executed at all. Do you get any messages in the log if you run an arbitrary data or proc step or simply the above %PUT statement?

 

 

klndsy
Calcite | Level 5

I do see the "%put ### done.;" in the code.  

 

I copied a screenshot of my log when I run the macro.  It just reads the code, but it doesn't seem to be doing anything.  I just did a proc means to test if any messages are logging, and that logged just fine.

 

sashelp.jpg

FreelanceReinh
Jade | Level 19

Your log doesn't show a macro call, but just the macro definition.

 

How did the macro call look like?

 

Btw, I've just run the macro with your data. Now I'm doing some checks on the results.

klndsy
Calcite | Level 5

@FreelanceReinh oh my goodness; I don't think I called it. Ha! Quite obviously, I've never used a macro before. How do I do that? 

FreelanceReinh
Jade | Level 19

Aha! 🙂

 

Well, since you entered your parameters already in the macro definition (as default values), you can run the macro with these default parameter values simply by submitting

%WINSORIZE;

To run it with parameter values different from the default values you specify the parameter names and values in the same way as they are written in the macro definition.

 

Example:

%winsorize(inset=have, sortvar=quarter);

This would use dataset HAVE as INSET and variable QUARTER as SORTVAR. All other parameters would use their defaults.

klndsy
Calcite | Level 5

@FreelanceReinh I can't believe it was that easy/silly this whole time.  Ha!!! Thank you so much for your help!

ballardw
Super User

If providing data is best to provide a data step like this:

data have;
   input x y;
datalines;
123 456
23  789
45559 111.04
;

Excel has too many opportunities for your variables to be a different type if we have to 1)download a file 2) run proc import or other code.

 

The SAS datasets have potential issues from different operating systems, encoding, "bitness" and such.

klndsy
Calcite | Level 5

sure! my code is set up similar to this:

 

data have;
input firm quarter capxy rdxy;
datalines;

1 2003.1 0.023 0.053
2 2003.1 0.679 3.877
3 2003.1 0.636 0.000
4 2003.1 5.453 1.923
1 2003.2 0.543 0.043
2 2003.2 0.921 1.924
3 2003.2 0.843 0.000
4 2003.2 0.982 1.482

 

 

FreelanceReinh
Jade | Level 19

Just to let you know: The Winsorized means obtained from the macro's output dataset will typically be slightly different from the corresponding Winsorized means obtained from PROC UNIVARIATE. This is due to different definitions of the relevant percentiles.

 

Example (based on your big dataset with 17427 observations):

For CAPXY in quarter 2003.1 your macro uses the 99% quantile, 550, as the replacement for values beyond this threshold. PROC UNIVARIATE, however, uses the 21st largest value, 537, as the threshold. The Winsorized means are approx. 22.86 vs. 22.73, respectively.

 

I have to call it a day now (or rather a night, here in Central Europe) and will be back tomorrow.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 3792 views
  • 0 likes
  • 3 in conversation