SAS MACRO: Capping Outliers for Dollar amount variables

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 101
Accepted Solution

SAS MACRO: Capping Outliers for Dollar amount variables

Hi all,

 

I have been working on some product purchase propensity models and have started using actual Dollar amount variables.   It was recommended to me that a cap extreme/ outlining values before inputting them into a model.  I literally have hundreds of dollar value variables and in order to do this it will obviously take a tremendous amount of time.   

 

Does anyone have a macro or familiar with a macro which can accomplish this large task? I have been searching around the web and having trouble finding a macro and the ones I have found are difficult to understand (I’m not much of a macro guy).

 

So,

 

1.a macro which will determine the appropriate “cap” based on the dollar distributions and then  “cap” the variables

 

and/or

 

2.a macro which will let me select a percentile value (say 99) and then cap all variables at or hier than that value

 

Any assistance will be greatly appreciated!! Thanks! 


Accepted Solutions
Solution
‎02-22-2017 03:27 PM
Super User
Posts: 19,167

Re: SAS MACRO: Capping Outliers for Dollar amount variables

Replace the sections in RED. You need to provide the data set name and the variable list to the proc means. In the data step at the bottom, make sure to also specify the dataset to be capped. The rest of the code is run as is. 

 

*Calculate IQR and first/third quartiles;
proc means data=[YOUR DATA SET NAME] stackods n qrange p1 p99;
var [YOUR VARIABLE LIST GOES HERE];
ods output summary=ranges;
run;

 

 

*macro to cap outliers, run the rest as is

%macro cap(dset=,var=, lower=, upper=);

data &dset;
set &dset;
if &var>&upper then &var=&upper;
if &var<&lower then &var=&lower;
run;

%mend;


*create cutoffs and execute macro for each variable;
data cutoffs;
set ranges;
lower=p1;
upper=p99;
string = catt('%cap(dset=[Your Dataset Name], var=', variable, ", lower=", lower, ", upper=", upper ,");");
call execute(string);
run;

View solution in original post


All Replies
Frequent Contributor
Posts: 101

Re: SAS MACRO: Capping Outliers for Dollar amount variables

I'm actually probably just looking for a macro which will cap all values above the 99th percentile not a macro which determine the appropriate cap and then cap the vars

Super User
Posts: 19,167

Re: SAS MACRO: Capping Outliers for Dollar amount variables

Here's one I wrote that does it for upper/lower. Change the PROC MEANS if you want 99% and 1%. 

 

I don't know how efficient it is in terms of code, but in terms of your programming time, it's pretty much exactly what you want. I think Rick Wicklin has a recent post on winsorization. http://blogs.sas.com/content/iml/2017/02/08/winsorization-good-bad-and-ugly.html

 

As noted, not sure if this handles missing correctly - I suspect it doesn't but it would be an easy fix. 

 

https://gist.github.com/statgeek/31316a678433a1db8136

 

*Calculate IQR and first/third quartiles;
proc means data=sashelp.class stackods n qrange p1 p99;
var weight height;
ods output summary=ranges;
run;

*create data with outliers to check;
data capped; 
	set sashelp.class;
	if name='Alfred' then weight=220;
	if name='Jane' then height=-30;
run;

*macro to cap outliers;

%macro cap(dset=,var=, lower=, upper=);

data &dset;
	set &dset;
	if &var>&upper then &var=&upper;
	if &var<&lower then &var=&lower;
run;

%mend;


*create cutoffs and execute macro for each variable;
data cutoffs;
set ranges;
lower=p1;
upper=p99;
string = catt('%cap(dset=capped, var=', variable, ", lower=", lower, ", upper=", upper ,");");
call execute(string);
run;
Frequent Contributor
Posts: 101

Re: SAS MACRO: Capping Outliers for Dollar amount variables

Hi Rezza, 

 

Thanks so much. 

 

I'm a little confused how this works

 

proc means data=sashelp.class stackods n qrange p1 p99;
var weight height;
ods output summary=ranges;
run;

 

*create data with outliers to check;
I have millions of observations and Hundreds of variables I cant do this check for each observation
data capped;
set sashelp.class;
if name='Alfred' then weight=220;
if name='Jane' then height=-30;
run;

 

*macro to cap outliers;

Where do I input the variable list?

%macro cap(dset=,var=, lower=, upper=);

data &dset;
set &dset;
if &var>&upper then &var=&upper;
if &var<&lower then &var=&lower;
run;

%mend;

 

Is the syntax I need to add look like this? 

%cap(daset=have var=list of vars lower=P1 upper=P99);

 

*create cutoffs and execute macro for each variable;
data cutoffs;
set ranges;
lower=p1;
upper=p99;
string = catt('%cap(dset=capped, var=', variable, ", lower=", lower, ", upper=", upper ,");");
call execute(string);
run;

Super User
Posts: 19,167

Re: SAS MACRO: Capping Outliers for Dollar amount variables


RobertNYC wrote:

Hi Rezza, 

 

Thanks so much. 

 

I'm a little confused how this works

 

proc means data=sashelp.class stackods n qrange p1 p99;
var weight height;
ods output summary=ranges;
run;

 

 

*create data with outliers to check;
I have millions of observations and Hundreds of variables I cant do this check for each observation

This is to create a sample dataset to test. You don't need this step, you use your original data
data capped;
set sashelp.class;
if name='Alfred' then weight=220;
if name='Jane' then height=-30;
run;

 

*macro to cap outliers;

Where do I input the variable list?

Variable list should be provided in the proc means VAR statement in the first step

%macro cap(dset=,var=, lower=, upper=);

data &dset;
set &dset;
if &var>&upper then &var=&upper;
if &var<&lower then &var=&lower;
run;

%mend;

 

Is the syntax I need to add look like this? 

%cap(daset=have var=list of vars lower=P1 upper=P99);

No, the macro is automatically executed for every variable in the output from the proc means. Var is designed for a single variable, not multiple variables. 

 

*create cutoffs and execute macro for each variable;
data cutoffs;
set ranges;
lower=p1;
upper=p99;
string = catt('%cap(dset=capped, var=', variable, ", lower=", lower, ", upper=", upper ,");");
call execute(string);
run;


 

I'll repost an answer that shows how you would use it.

Solution
‎02-22-2017 03:27 PM
Super User
Posts: 19,167

Re: SAS MACRO: Capping Outliers for Dollar amount variables

Replace the sections in RED. You need to provide the data set name and the variable list to the proc means. In the data step at the bottom, make sure to also specify the dataset to be capped. The rest of the code is run as is. 

 

*Calculate IQR and first/third quartiles;
proc means data=[YOUR DATA SET NAME] stackods n qrange p1 p99;
var [YOUR VARIABLE LIST GOES HERE];
ods output summary=ranges;
run;

 

 

*macro to cap outliers, run the rest as is

%macro cap(dset=,var=, lower=, upper=);

data &dset;
set &dset;
if &var>&upper then &var=&upper;
if &var<&lower then &var=&lower;
run;

%mend;


*create cutoffs and execute macro for each variable;
data cutoffs;
set ranges;
lower=p1;
upper=p99;
string = catt('%cap(dset=[Your Dataset Name], var=', variable, ", lower=", lower, ", upper=", upper ,");");
call execute(string);
run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 356 views
  • 0 likes
  • 2 in conversation