Dear all,
I have about 30 variables for which I amend outliers beyond the 97-percentile to the value of the individual 97-percentile cut-off. Identifying the 97-percentile I use proc univariate. The variables are all some kind of income variables and an income of zero should be excluded from the procedure.
Of course, I could add (where var NE 0) in the data selection, but I would rather not want to repeat the code 30+ times.
Can anybody help? My internet research hasn't delivered any solution.
Thank you very much in advance.
Gerit
Hello @MsGeritO,
You can also include the DATA step into the macro where you "cap" the variables at their respective 97th percentiles:
%macro capvar(data=, /* input dataset */
out=, /* output dataset */
varlist=, /* list of numeric variables to be capped */
pctl= /* cut-off percentile (e.g. 97) */
);
%local i nv var;
%let nv=%sysfunc(countw(&varlist));
/* Compute percentiles */
%do i=1 %to &nv;
%let var=%scan(&varlist,&i);
proc univariate data=&data noprint;
where &var ne 0;
var &var;
output out=_p&i pctlpts=&pctl pctlpre=_ pctlname=pctl;
run;
%end;
/* Write percentiles to macro variables */
data _null_;
set _p1-_p&nv;
call symputx(cat('_p',_n_),_pctl,'L');
run;
/* Create output dataset */
data &out;
set &data;
%do i=1 %to &nv;
%let var=%scan(&varlist,&i);
if &var>&&_p&i then &var=&&_p&i;
%end;
run;
%mend capvar;
Example call of the macro:
%capvar(data=sashelp.heart, out=want, varlist=Weight Diastolic Systolic, pctl=97);
A macro would work here. Example:
%macro dothis;
proc contents data=have noprint out=_contents_;
run;
proc sql noprint;
select name into :names separated by ' ' from _contents_;
quit;
%do i=1 %to %sqlobs;
%let thisvar=%scan(&names,&i,%str( ));
/* Add any other options to PROC UNIVARIATE that you want */
proc univariate data=have(where=(&thisvar^=0));
var &thisvar;
run;
%end;
%mend;
%dothis
Hello @MsGeritO,
You can also include the DATA step into the macro where you "cap" the variables at their respective 97th percentiles:
%macro capvar(data=, /* input dataset */
out=, /* output dataset */
varlist=, /* list of numeric variables to be capped */
pctl= /* cut-off percentile (e.g. 97) */
);
%local i nv var;
%let nv=%sysfunc(countw(&varlist));
/* Compute percentiles */
%do i=1 %to &nv;
%let var=%scan(&varlist,&i);
proc univariate data=&data noprint;
where &var ne 0;
var &var;
output out=_p&i pctlpts=&pctl pctlpre=_ pctlname=pctl;
run;
%end;
/* Write percentiles to macro variables */
data _null_;
set _p1-_p&nv;
call symputx(cat('_p',_n_),_pctl,'L');
run;
/* Create output dataset */
data &out;
set &data;
%do i=1 %to &nv;
%let var=%scan(&varlist,&i);
if &var>&&_p&i then &var=&&_p&i;
%end;
run;
%mend capvar;
Example call of the macro:
%capvar(data=sashelp.heart, out=want, varlist=Weight Diastolic Systolic, pctl=97);
Thank you. This does exactly what I needed and works perfectly.
Temporary data set where you replace the 0 values with missing.
Something like:
data temp; set have; array inc <list of income variable names goes here>; do _i_ = 1 to dim(inc); if inc[_i_]=0 then [_i_]=.; end; run;
And use that set for proc univariate to find your outliers.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.