BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RobertNYC
Obsidian | Level 7

Hi all,

 

I have been working on some product purchase propensity models and have started using actual Dollar amount variables.   It was recommended to me that a cap extreme/ outlining values before inputting them into a model.  I literally have hundreds of dollar value variables and in order to do this it will obviously take a tremendous amount of time.   

 

Does anyone have a macro or familiar with a macro which can accomplish this large task? I have been searching around the web and having trouble finding a macro and the ones I have found are difficult to understand (I’m not much of a macro guy).

 

So,

 

1.a macro which will determine the appropriate “cap” based on the dollar distributions and then  “cap” the variables

 

and/or

 

2.a macro which will let me select a percentile value (say 99) and then cap all variables at or hier than that value

 

Any assistance will be greatly appreciated!! Thanks! 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Replace the sections in RED. You need to provide the data set name and the variable list to the proc means. In the data step at the bottom, make sure to also specify the dataset to be capped. The rest of the code is run as is. 

 

*Calculate IQR and first/third quartiles;
proc means data=[YOUR DATA SET NAME] stackods n qrange p1 p99;
var [YOUR VARIABLE LIST GOES HERE];
ods output summary=ranges;
run;

 

 

*macro to cap outliers, run the rest as is

%macro cap(dset=,var=, lower=, upper=);

data &dset;
set &dset;
if &var>&upper then &var=&upper;
if &var<&lower then &var=&lower;
run;

%mend;


*create cutoffs and execute macro for each variable;
data cutoffs;
set ranges;
lower=p1;
upper=p99;
string = catt('%cap(dset=[Your Dataset Name], var=', variable, ", lower=", lower, ", upper=", upper ,");");
call execute(string);
run;

View solution in original post

5 REPLIES 5
RobertNYC
Obsidian | Level 7

I'm actually probably just looking for a macro which will cap all values above the 99th percentile not a macro which determine the appropriate cap and then cap the vars

Reeza
Super User

Here's one I wrote that does it for upper/lower. Change the PROC MEANS if you want 99% and 1%. 

 

I don't know how efficient it is in terms of code, but in terms of your programming time, it's pretty much exactly what you want. I think Rick Wicklin has a recent post on winsorization. http://blogs.sas.com/content/iml/2017/02/08/winsorization-good-bad-and-ugly.html

 

As noted, not sure if this handles missing correctly - I suspect it doesn't but it would be an easy fix. 

 

https://gist.github.com/statgeek/31316a678433a1db8136

 

*Calculate IQR and first/third quartiles;
proc means data=sashelp.class stackods n qrange p1 p99;
var weight height;
ods output summary=ranges;
run;

*create data with outliers to check;
data capped; 
	set sashelp.class;
	if name='Alfred' then weight=220;
	if name='Jane' then height=-30;
run;

*macro to cap outliers;

%macro cap(dset=,var=, lower=, upper=);

data &dset;
	set &dset;
	if &var>&upper then &var=&upper;
	if &var<&lower then &var=&lower;
run;

%mend;


*create cutoffs and execute macro for each variable;
data cutoffs;
set ranges;
lower=p1;
upper=p99;
string = catt('%cap(dset=capped, var=', variable, ", lower=", lower, ", upper=", upper ,");");
call execute(string);
run;
RobertNYC
Obsidian | Level 7

Hi Rezza, 

 

Thanks so much. 

 

I'm a little confused how this works

 

proc means data=sashelp.class stackods n qrange p1 p99;
var weight height;
ods output summary=ranges;
run;

 

*create data with outliers to check;
I have millions of observations and Hundreds of variables I cant do this check for each observation
data capped;
set sashelp.class;
if name='Alfred' then weight=220;
if name='Jane' then height=-30;
run;

 

*macro to cap outliers;

Where do I input the variable list?

%macro cap(dset=,var=, lower=, upper=);

data &dset;
set &dset;
if &var>&upper then &var=&upper;
if &var<&lower then &var=&lower;
run;

%mend;

 

Is the syntax I need to add look like this? 

%cap(daset=have var=list of vars lower=P1 upper=P99);

 

*create cutoffs and execute macro for each variable;
data cutoffs;
set ranges;
lower=p1;
upper=p99;
string = catt('%cap(dset=capped, var=', variable, ", lower=", lower, ", upper=", upper ,");");
call execute(string);
run;

Reeza
Super User

@RobertNYC wrote:

Hi Rezza, 

 

Thanks so much. 

 

I'm a little confused how this works

 

proc means data=sashelp.class stackods n qrange p1 p99;
var weight height;
ods output summary=ranges;
run;

 

 

*create data with outliers to check;
I have millions of observations and Hundreds of variables I cant do this check for each observation

This is to create a sample dataset to test. You don't need this step, you use your original data
data capped;
set sashelp.class;
if name='Alfred' then weight=220;
if name='Jane' then height=-30;
run;

 

*macro to cap outliers;

Where do I input the variable list?

Variable list should be provided in the proc means VAR statement in the first step

%macro cap(dset=,var=, lower=, upper=);

data &dset;
set &dset;
if &var>&upper then &var=&upper;
if &var<&lower then &var=&lower;
run;

%mend;

 

Is the syntax I need to add look like this? 

%cap(daset=have var=list of vars lower=P1 upper=P99);

No, the macro is automatically executed for every variable in the output from the proc means. Var is designed for a single variable, not multiple variables. 

 

*create cutoffs and execute macro for each variable;
data cutoffs;
set ranges;
lower=p1;
upper=p99;
string = catt('%cap(dset=capped, var=', variable, ", lower=", lower, ", upper=", upper ,");");
call execute(string);
run;


 

I'll repost an answer that shows how you would use it.

Reeza
Super User

Replace the sections in RED. You need to provide the data set name and the variable list to the proc means. In the data step at the bottom, make sure to also specify the dataset to be capped. The rest of the code is run as is. 

 

*Calculate IQR and first/third quartiles;
proc means data=[YOUR DATA SET NAME] stackods n qrange p1 p99;
var [YOUR VARIABLE LIST GOES HERE];
ods output summary=ranges;
run;

 

 

*macro to cap outliers, run the rest as is

%macro cap(dset=,var=, lower=, upper=);

data &dset;
set &dset;
if &var>&upper then &var=&upper;
if &var<&lower then &var=&lower;
run;

%mend;


*create cutoffs and execute macro for each variable;
data cutoffs;
set ranges;
lower=p1;
upper=p99;
string = catt('%cap(dset=[Your Dataset Name], var=', variable, ", lower=", lower, ", upper=", upper ,");");
call execute(string);
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 3438 views
  • 2 likes
  • 2 in conversation