How do i remove outliers from a dataset?

WorkingMan · Posted 11-19-2018 12:31 AM

Hi,

I have a large dataset and I want to remove outliers. Before i proceed further, I've done some research online and below is my code:

proc sql noprint;
	create table ppussu_sum as
	select substationname
	, substationcode
	, voltagetype
	, substationtype
	, datetime
	, sum(power_calculated) as sum_power
	from ncpdm.ncp_loadpf_&srcnm._&period
	where (substationtype = 'PPU' and feedertype = 'TXF')
		or (substationtype = 'SSU' and feedertype = 'OF')
	group by substationname, substationcode, voltagetype, substationtype, datetime
	;
quit;



proc univariate data = ppussu_sum;
var sum_power;
output out=boxStats median=median qrange = iqr;
run; 

data _null_;
	set boxStats;
	call symput ('median',median);
	call symput ('iqr', iqr);
run; 

%put &median;
%put &iqr;

data trimmed;
set ppussu_sum;
    if (sum_power le &median + 1.5 * &iqr) and (sum_power ge &median - 1.5 * &iqr); 
run;

As you all can see, first i use proc univariate and then remove based on the formula. Just wondering if this is actually the correct way of doing it or is there another more accurate/appropriate way.

PeterClemmensen · Posted 11-19-2018 01:17 AM

What do you want to do with the data set without outliers? If you want compute some statistic or run some regression, the appropriate SAS procedure usually has options to robustly deal with outliers directly

The DATA to DATA Step Macro
Blog: SASnrd

WorkingMan · Posted 11-19-2018 01:26 AM

without outliers, data wont have sudden spike when displaying in visual dashboard. This is the only reason why i want to remove it.

Astounding · Posted 11-19-2018 06:43 AM

Note that there is an alternative approach ... capping the outliers instead of removing them. For example:

   if (sum_power > &median + 1.5 * &iqr) then sum_power = &median + 1.5 * &iqr);

How do i remove outliers from a dataset?

Re: How do i remove outliers from a dataset?

Re: How do i remove outliers from a dataset?

Re: How do i remove outliers from a dataset?

Catch up on SAS Innovate 2026

How do i remove outliers from a dataset?

Re: How do i remove outliers from a dataset?

Re: How do i remove outliers from a dataset?

Re: How do i remove outliers from a dataset?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away