SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
WorkingMan
Calcite | Level 5

Hi,

 

I have a large dataset and I want to remove outliers. Before i proceed further, I've done some research online and below is my code:

proc sql noprint;
	create table ppussu_sum as
	select substationname
	, substationcode
	, voltagetype
	, substationtype
	, datetime
	, sum(power_calculated) as sum_power
	from ncpdm.ncp_loadpf_&srcnm._&period
	where (substationtype = 'PPU' and feedertype = 'TXF')
		or (substationtype = 'SSU' and feedertype = 'OF')
	group by substationname, substationcode, voltagetype, substationtype, datetime
	;
quit;



proc univariate data = ppussu_sum;
var sum_power;
output out=boxStats median=median qrange = iqr;
run; 

data _null_;
	set boxStats;
	call symput ('median',median);
	call symput ('iqr', iqr);
run; 

%put &median;
%put &iqr;

data trimmed;
set ppussu_sum;
    if (sum_power le &median + 1.5 * &iqr) and (sum_power ge &median - 1.5 * &iqr); 
run; 

 

As you all can see, first i use proc univariate and then remove based on the formula. Just wondering if this is actually the correct way of doing it or is there another more accurate/appropriate way.

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

What do you want to do with the data set without outliers? If you want compute some statistic or run some regression, the appropriate SAS procedure usually has options to robustly deal with outliers directly

WorkingMan
Calcite | Level 5

without outliers, data wont have sudden spike when displaying in visual dashboard. This is the only reason why i want to remove it.

Astounding
PROC Star

Note that there is an alternative approach ... capping the outliers instead of removing them.  For example:

 

   if (sum_power > &median + 1.5 * &iqr) then sum_power = &median + 1.5 * &iqr); 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 10601 views
  • 4 likes
  • 3 in conversation