- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I have about 30 variables for which I amend outliers beyond the 97-percentile to the value of the individual 97-percentile cut-off. Identifying the 97-percentile I use proc univariate. The variables are all some kind of income variables and an income of zero should be excluded from the procedure.
Of course, I could add (where var NE 0) in the data selection, but I would rather not want to repeat the code 30+ times.
Can anybody help? My internet research hasn't delivered any solution.
Thank you very much in advance.
Gerit
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @MsGeritO,
You can also include the DATA step into the macro where you "cap" the variables at their respective 97th percentiles:
%macro capvar(data=, /* input dataset */
out=, /* output dataset */
varlist=, /* list of numeric variables to be capped */
pctl= /* cut-off percentile (e.g. 97) */
);
%local i nv var;
%let nv=%sysfunc(countw(&varlist));
/* Compute percentiles */
%do i=1 %to &nv;
%let var=%scan(&varlist,&i);
proc univariate data=&data noprint;
where &var ne 0;
var &var;
output out=_p&i pctlpts=&pctl pctlpre=_ pctlname=pctl;
run;
%end;
/* Write percentiles to macro variables */
data _null_;
set _p1-_p&nv;
call symputx(cat('_p',_n_),_pctl,'L');
run;
/* Create output dataset */
data &out;
set &data;
%do i=1 %to &nv;
%let var=%scan(&varlist,&i);
if &var>&&_p&i then &var=&&_p&i;
%end;
run;
%mend capvar;
Example call of the macro:
%capvar(data=sashelp.heart, out=want, varlist=Weight Diastolic Systolic, pctl=97);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A macro would work here. Example:
%macro dothis;
proc contents data=have noprint out=_contents_;
run;
proc sql noprint;
select name into :names separated by ' ' from _contents_;
quit;
%do i=1 %to %sqlobs;
%let thisvar=%scan(&names,&i,%str( ));
/* Add any other options to PROC UNIVARIATE that you want */
proc univariate data=have(where=(&thisvar^=0));
var &thisvar;
run;
%end;
%mend;
%dothis
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello @MsGeritO,
You can also include the DATA step into the macro where you "cap" the variables at their respective 97th percentiles:
%macro capvar(data=, /* input dataset */
out=, /* output dataset */
varlist=, /* list of numeric variables to be capped */
pctl= /* cut-off percentile (e.g. 97) */
);
%local i nv var;
%let nv=%sysfunc(countw(&varlist));
/* Compute percentiles */
%do i=1 %to &nv;
%let var=%scan(&varlist,&i);
proc univariate data=&data noprint;
where &var ne 0;
var &var;
output out=_p&i pctlpts=&pctl pctlpre=_ pctlname=pctl;
run;
%end;
/* Write percentiles to macro variables */
data _null_;
set _p1-_p&nv;
call symputx(cat('_p',_n_),_pctl,'L');
run;
/* Create output dataset */
data &out;
set &data;
%do i=1 %to &nv;
%let var=%scan(&varlist,&i);
if &var>&&_p&i then &var=&&_p&i;
%end;
run;
%mend capvar;
Example call of the macro:
%capvar(data=sashelp.heart, out=want, varlist=Weight Diastolic Systolic, pctl=97);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. This does exactly what I needed and works perfectly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Temporary data set where you replace the 0 values with missing.
Something like:
data temp; set have; array inc <list of income variable names goes here>; do _i_ = 1 to dim(inc); if inc[_i_]=0 then [_i_]=.; end; run;
And use that set for proc univariate to find your outliers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content