Programming the statistical procedures from SAS

ROBUSTREG deals with outliers

Posts: 0

ROBUSTREG deals with outliers

I spent a few weeks using PROC REG to do some regression analysis for > 100 servers, characterizing many different things. I ran into a problem = outliers. They skewed the data terribly, causing some analysis to result in negative values for + 3 std (should be impossible). So I found PROC ROBUSTREG, and it works, sort of, but with pain.

It took me 2 months to get it to work for me because I at first assumed I was the problem. Turns out ROBUSTREG was not that well written, and after all my pain and other things, I think the SAS dudes are rewriting it, hopefully. Dr. Paul and his team at SAS was very helpful at figuring out the work arounds.

Anyways, to get it to work well, here is what it took.

1) break the data up, use a "grinder" to present pieces of "by group" data. ROBUSTREG tries to load all the data into memory, not simply a by-group at a time.

2) Using S, not the default method=M (although method=M should be the correct procedure, it was wraught with problems). S provided the fastest, bestest(sic) answer. LTS also worked, mostly.

3) "jittering" the data. A perfect fit causes problems as well. So, by forcing a miniscule variance in the data so that the process has to actually do some work (no zeroes).

4) Had to insure there were at least 7 observations in a by-group.

I needed the output tables, so I had to use ODS data table output, which is a good thing, mostly. If the output is "perfect" or it fails in some way, it does not produce the parameter table, not an empty table.

Jittering =
if ranuni(0) > .5 then bias= +1.0;
else bias= -1.0;
&dependent = &dependent + 0.000001*bias*&dependent + ranuni(0)*0.00001*bias ;

Proc call =
data _rr_indata;
set _inset_(where=(&where_clause));

proc robustreg data=_rr_indata fwls method=S;
by &by_var dow timestamp ;
model &dependent = i ;
ods output goodfit=_fit_ parameterestimatesF=_parms_ NObs=_nobs_ ;

the "grinder" =
%let robustreg = %qsubstr('%robustreg',2,10);

data _null_;
length statement $256 where_clause $128;
set _things_;
retain cnt 0;
where_clause = &where_clause ;
statement = '%superq(ROBUSTREG)(' || "'" ||trim(where_clause)|| "'," || trim(left(cnt)) || ")" ;
call execute(statement);

There is more code involved in this, but this gives the gist of what has to be done.

With all this done, the outliers are successfully eliminated from the regression and the results are what they should be, which allows me to use the regression results to predict "normal" behavior and identify future outliers. Message was edited by: Chuck
Ask a Question
Discussion stats
  • 0 replies
  • 1 in conversation