topic Re: How do i remove outliers from a dataset? in SAS Programming

How do i remove outliers from a dataset?

WorkingMan — Mon, 19 Nov 2018 05:31:28 GMT

Hi,

I have a large dataset and I want to remove outliers. Before i proceed further, I've done some research online and below is my code:

proc sql noprint;
	create table ppussu_sum as
	select substationname
	, substationcode
	, voltagetype
	, substationtype
	, datetime
	, sum(power_calculated) as sum_power
	from ncpdm.ncp_loadpf_&srcnm._&period
	where (substationtype = 'PPU' and feedertype = 'TXF')
		or (substationtype = 'SSU' and feedertype = 'OF')
	group by substationname, substationcode, voltagetype, substationtype, datetime
	;
quit;



proc univariate data = ppussu_sum;
var sum_power;
output out=boxStats median=median qrange = iqr;
run; 

data _null_;
	set boxStats;
	call symput ('median',median);
	call symput ('iqr', iqr);
run; 

%put &median;
%put &iqr;

data trimmed;
set ppussu_sum;
    if (sum_power le &median + 1.5 * &iqr) and (sum_power ge &median - 1.5 * &iqr); 
run;

As you all can see, first i use proc univariate and then remove based on the formula. Just wondering if this is actually the correct way of doing it or is there another more accurate/appropriate way.

Re: How do i remove outliers from a dataset?

PeterClemmensen — Mon, 19 Nov 2018 06:17:37 GMT

What do you want to do with the data set without outliers? If you want compute some statistic or run some regression, the appropriate SAS procedure usually has options to robustly deal with outliers directly

Re: How do i remove outliers from a dataset?

WorkingMan — Mon, 19 Nov 2018 06:26:47 GMT

without outliers, data wont have sudden spike when displaying in visual dashboard. This is the only reason why i want to remove it.

Re: How do i remove outliers from a dataset?

Astounding — Mon, 19 Nov 2018 11:43:54 GMT

Note that there is an alternative approach ... capping the outliers instead of removing them. For example:

   if (sum_power > &median + 1.5 * &iqr) then sum_power = &median + 1.5 * &iqr);