Help using Base SAS procedures

How to identify and remove outlier?

Reply
Occasional Contributor
Posts: 11

How to identify and remove outlier?

Hi,

 

How can I identify outliers and remove them from my database? I used the command below to check the homoscedasticity of variance and normality of errors, as suggested by @SteveDenham but I don't know how to proceed after that.

 

proc glm;

class cast*drug;

model WBC = cast*drug;

means cast*drug / hovtest = levene(type=ABS) welch;

output out = resids r=residual; run;

 

proc univariate;

var residual;

qq plot = residual;

run;

 

 

Super User
Posts: 17,963

Re: How to identify and remove outlier?

1. Define how you're categorizing data as an outlier

2. Define rules for an outlier

3. Extract data as required

 

Which step are you on?

Occasional Contributor
Posts: 11

Re: How to identify and remove outlier?

My question is:

After checking the normality of ERRORS, and since PROC UNIVARIATE identifies the normality of residuals...how do I remove the outliers from the residuals?

Valued Guide
Posts: 505

Re: How to identify and remove outlier?

/* T0099710 Using Robustreg, Grubb macro, Outlier macro and Ricks tools to identify outliers from sashelp.bweight

Using Robustreg to identify outliers from sashelp.bweight

Ricks Blogs
/* T700509 Detecting outliers in SAS: Part 1: Estimating location
https://goo.gl/EGbdso
http://blogs.sas.com/content/iml/2012/01/20/detecting-outliers-in-sas-part-1-estimating-location.html

/* T700506 Detecting outliers in SAS: Part 2: Estimating scale
https://goo.gl/PhGnOU
http://blogs.sas.com/content/iml/2012/01/27/detecting-outliers-in-sas-part-2-estimating-scale.html

/* T700504 Detecting outliers in SAS: Part 3: Multivariate location and scatter
https://goo.gl/1pKqab
http://blogs.sas.com/content/iml/2012/02/02/detecting-outliers-in-sas-part-3-multivariate-location-and-scatter.html

R package outliers
https://cran.r-project.org/web/packages/outliers/outliers.pdf

My verification and validation tool does more extensive outlier detection
https://www.dropbox.com/s/po3ahepe7r7dnm7/oto_voodoo.sas?dl=0
see doc on end


HAVE a list of 50,000 infant births weights (sashelp.bweight)
=============================================================

Up to 10 obs from sashelp.bweight total obs=50,000

  Obs    WEIGHT

    1     4111
    2     3997
    3     3572
    4     1956
    5     3515
    6     3757
    7     2977
    8     3884
    9     3629
   10     3062

WANT 30 Worst outliers based on a stadardized normal using robustreg
     (may want to add the 30 with the highest leverage)
====================================================================

Outlier analsys up to the 30 outliers
Robust Regression with %syfunc(int()) * sigma cuttoff and removal of expected outliers?

Obs     OBS     SIGMAS    WEIGHT

  1    45253    6.1760      240
  2     4319    6.0902      284
  3    17687    6.0902      284
  4    18398    6.0355      312
  5     6789    6.0160      322
  6     7881    6.0004      330
  7     1754    5.9809      340
  8    44302    5.9809      340
  9    11413    5.9614      350
 10     5948    5.9244      369
....
 25    29420    5.7742      446
 26    19603    5.7586      454
 27    48553    5.7586      454
 28    43037    5.7586      454
 29    39248    5.7527      457
 30    31299    5.7469      460


GRUBS MACRO
=============

Up to 40 obs from sashelp.bweight total obs=50,000

                                                     MIN_       MAX_      MEAN_       STD_
Obs    GRBTEST    GRBALPHA    GRBOBS    GRBDROP    GRBVALS    GRBVALS    GRBVALS    GRBVALS    GRBCALC    GRBCRIT    GRBPSTAT

  1      Max        0.05       50000     34693       240        6350     3370.76    566.385    5.26010    4.75291     0.0036
  2      Min        0.05       49999     45253       240        5970     3370.70    566.234    5.52898    4.75291     0.0008
  3      Min        0.05       49998      4319       284        5970     3370.76    566.273    5.45101    4.75291     0.0012
  4      Min        0.05       49997     17687       284        5970     3370.82    566.317    5.45070    4.75291     0.0012
  5      Min        0.05       49996     18398       312        5970     3370.88    566.360    5.40095    4.75291     0.0017
  6      Min        0.05       49995      6789       322        5970     3370.94    566.407    5.38296    4.75291     0.0018
  7      Min        0.05       49994      7881       330        5970     3371.01    566.455    5.36849    4.75291     0.0020
  8      Min        0.05       49993      1754       340        5970     3371.07    566.504    5.35048    4.75291     0.0022
  9      Min        0.05       49992     44302       340        5970     3371.13    566.553    5.35012    4.75291     0.0022
 10      Min        0.05       49991     11413       350        5970     3371.19    566.603    5.33211    4.75291     0.0024
 11      Min        0.05       49990     16672       369        5970     3371.25    566.654    5.29820    4.75291     0.0029
 12      Min        0.05       49989     15094       369        5970     3371.31    566.707    5.29782    4.75291     0.0029
 13      Min        0.05       49988      5948       369        5970     3371.37    566.760    5.29743    4.75291     0.0029
 14      Min        0.05       49987     38068       369        5970     3371.43    566.812    5.29704    4.75291     0.0029
 15      Min        0.05       49986     26610       378        5970     3371.49    566.865    5.28078    4.75291     0.0032


WORKING SOLUTION
================

%_vdo_outlyr(
     lib =sashelp
    ,data=bweight
    ,var=weight
    );

%utl_Grubbs(work.bweight,weight,alpha=.05);


FULL SOLUTION
=============

* might be interesting to look at a subset;
data bweight;
  set sashelp.bweight(where=(1000 < weight ));
run;quit;

%_vdo_outlyr(
     lib =work
    ,data=bweight
    ,var=weight
    );

30 worst outlier analsys up to the 30 out 481 outliers identified) outliers
Robust Regression with 3.2886758223 * sigma cuttoff and removal of expected outliers?

Obs     OBS     SIGMAS    WEIGHT

  1    34539    5.8076     6350
  2    25551    5.0578     5970
  3    14897    4.7291     1010
  4    23172    4.7291     1010
  5    45693    4.7094     1020
  6    31525    4.7094     1020
  7    17803    4.7074     1021
  8    21606    4.7074     1021
  9    11799    4.7074     1021


%macro _vdo_outlyr(
     lib =sashelp
    ,data=bweight
    ,var=weight
    );

    /*
      %let lib =sashelp;
      %let data=bweight;
      %let var=weight;
    */

    %local nobs cutoff expected_outliers dotherest;

    * grubs test is better but it is too computationally expensive
      It recomputes thowing ot the worst outlier;

    *  I do have a couple of grub test macros in R and SAS;

    * arguments have already been checked by the call driver;

    * estimate the best cuttoff in terms of the sigma on standardized values;

    * number of obs in source data;
    proc sql noprint; select count(*) into :nobs from &lib..&data;quit;

    %put &=nobs;

    * get the best cutoff in terms of N * sigma and the expected outliers;
    data _null;
      * (e^x)^2) -> sqrt(log(x));
      cutoff=sqrt(log(&nobs));
      call symputx('cutoff',cutoff);
      put cutoff=;
      expected_outliers=2 * &nobs *(1-cdf('normal',cutoff));
      call symputx('expected_outliers',expected_outliers);
      put expected_outliers=;
    run;quit;

    * robust regression - only interested in outliers - leverage values might also be interesting;
    ods exclude all;
    ods output diagnostics=__vvdag;
    proc robustreg data=&lib..&data method=MM;
    model &var = /diagnostics  cutoff=&cutoff /* stadardized sigma &cutoff * sigma */;
    run;
    ods select all;
    ods listing;

    * number of potential outliers;
    proc sql noprint;select count(*) into :obs from __vvdag;quit;

    * do we have more than the expected outliers;
    %let dotherest=%sysfunc(ifc(&obs > &expected_outliers,1,0));

    %put &=dotherest;

    * do we have outliers;
    %if &dotherest %then %do; * YES? ;

        * set up for sort by abs value;
        data _vvdagabs;
         set __vvdag;
         rresidual=abs(rresidual);
        run;quit;

        /* sort by abs value so we can remove the lower expected_outliers */
        proc sort data=_vvdagabs out=_vvdagsrt noequals;
        by rresidual;
        run;quit;

        /* drop the lower values */
        data _vvdagsel;
         set _vvdagsrt(firstobs=%sysfunc(int(&expected_outliers))); * remove expected outliers;
        run;quit;

        * go back to full data and get orginal value;
        * get bad values;
        data _vvdagget;
          do until (dne);
            set _vvdagsel(drop=outlier) end=dne;
            rec=obs;
            set &lib..&data(keep=&var)  point=rec;
            output;
          end;
          stop;
        run;quit;

        proc sort data=_vvdagget  out=_vvdagfin noequals;
        by descending rresidual;
        run;quit;

        title1 ' ';title2 ' ';title3 ' ' ;
        TITLE4 "30 worst outlier analsys up to the 30 out &obs outliers identified) outliers";
        TITLE5 "Robust Regression with &cutoff * sigma cuttoff and removal of expected outliers?";

        proc print data=_vvdagfin(obs=30 rename=rresidual=sigmas) width=min;
        run;quit;
        title;
    %end;
    %else %do;
       data _null_;
          file print;
          put  "Outlier analsys up to the 30 outliers(&obs outliers identified)";
          put  "Robust Regression with &cutoff * sigma cuttoff and removal of expected outliers?";
          put "***********************************";
          put " No outliers using robustreg";
          put "***********************************";
       run;quit;
    %end;

    proc datasets lib=work;
      delete _vvdag:;
    run;quit;

%mend _vdo_outlyr;

%_vdo_outlyr;





/* T002680 GRUBBS TEST FOR OUTLIERS USING SAS */

%macro utl_Grubbs(dsn,var,alpha=.1)
    /des ="Alternating removal of Max and Min Outliers using Grubbs test";

/* output dataset is the input dataset with outlier flag */

/* for testing without macro
   data shoes;set sashelp.shoes(keep=sales);run;
   %let dsn=work.shoes;
   %let var=sales;
   %let alpha=.1;
*/

   %local utl_grbnobs;

   %put %sysfunc(ifc(%sysevalf(%superq(dsn)=,boolean),**** Please Input dataset                   ****,));
   %put %sysfunc(ifc(%sysevalf(%superq(var)=,boolean),**** Please Input Variable                  ****,));

   proc means data=&dsn;
   var &var;
   output out=__out__ mean=mean std=std max=max min=min n=n;
   run;
   /*------------------------------------------------------------*\
   |  Add a unique key so that Outliers can be removed            |
   \*------------------------------------------------------------*/
    Data  Utl_Grubbs00 / View = Utl_Grubbs00;
     Set  &dsn(Keep=&var);
          Utl_GrbKey=_n_;
    Run;
   /*------------------------------------------------------------*\
   |  Sort so that we can trim both ends easily                   |
   \*------------------------------------------------------------*/
    Proc Sort Data=Utl_Grubbs00
              Out =Utl_Grubbs01
              Noequals
              Force;
    By  &var;
   run;
   /*------------------------------------------------------------*\
   |  Need to get number of obds and Type and Length for Key      |
   \*------------------------------------------------------------*/
   Proc Sql Noprint;
      Select Put(Nobs,12.)  into :Utl_GrbNobs
      From   SASHELP.VTable
      Where  Upcase("WORK")           = Upcase(LibName)   AND
             Upcase("Utl_Grubbs01")   = Upcase(MemName)   AND
             Upcase("Data")           = Upcase(MemType);
   quit;
   run;
   /*------------------------------------------------------------*\
   |  Compute the Grubbs Statistics                               |
   \*------------------------------------------------------------*/
   Data
         Utl_Grubbs02
            (
             Keep=
                   GrbTest
                   GrbAlpha
                   GrbObs
                   GrbDrop
                   Min_GrbVals
                   Max_GrbVals
                   Mean_GrbVals
                   Std_GrbVals
                   GrbCalc
                   GrbCrit
                   GrbPStat
            );

     Retain GrbNobs &Utl_GrbNobs GrbAlpha &Alpha;
   /*------------------------------------------------------------*\
   |  Allocate arrays to hold all data                            |
   \*------------------------------------------------------------*/
     Array  GrbVals {&Utl_GrbNobs}  _Temporary_;
     Array  GrbKeys {&Utl_GrbNobs}  _Temporary_;
   /*------------------------------------------------------------*\
   |  Load all Data into Temp Arrays - Temp Arrays can have       |
   |  millions of elements?                                       |
   \*------------------------------------------------------------*/
     GrbN=0;
     Do Until ( Done );
        Set Utl_Grubbs01 End=Done;
        GrbN+1;
        GrbKeys{GrbN} = Utl_GrbKey;
        GrbVals{GrbN} = &var;
     End;
     /*------------------------------------------------------------*\
     | Cannot use more elegant Mean(of Array), Std(Of Array)        |
     | SAS does not support these funtions on Temporary Arrays      |
     \*------------------------------------------------------------*/
     GrbObs=GrbNobs;
     GrbLo=1;
     GrbHi=GrbObs;
     Do  Until ( GrbFlag = 1 )  ;
         GrbFlag=0;
      Link BasicStats;
                   *  Pass=GrbLo,GrbHi,Array GrbVals
                      Return=Mean,Max,Min,Std;
      * Put      "Max Basic Stats ==>  "     /
                  GrbObs        =            /
                  GrbHi         =            /
                  GrbLo         =            /
                  Min_GrbVals   =            /
                  Max_GrbVals   =            /
                  Mean_GrbVals  =            /
                  Std_GrbVals   = ;
      /*------------------------------------------------------------*\
      | Maximum Outlier Case                                         |
      \*------------------------------------------------------------*/
      GrbTest='Max';
      GrbCalc    =( Max_GrbVals - Mean_GrbVals ) / Std_GrbVals ;
      GrbTStat   =TInv(1-&Alpha/GrbNobs,GrbNobs-2);
      Link GrbCompute;
                       * Pass=GrbCalc,GrbTStat,GrbObs
                         Return=GrbCrit GrbStat GrbPStat;
      * put       // "Max Grubbs Stats ==>  "   /
                      GrbCrit=                  /
                      GrbStat=                  /
                      GrbDenom=                 /
                      GrbPStat=;
      If GrbPStat < GrbAlpha Then
          GrbHi = GrbHi - 1;
      Else GrbFlag = GrbFlag + .5;
      * put          //  GrbFlag= //;
      /*------------------------------------------------------------*\
      | Recompute Basic Stats without Outlier                        |
      \*------------------------------------------------------------*/
      Link BasicStats; * (GrbLo,GrbHi);
      *       Put // "Min Basic Stats ==>  " /
                  GrbObs        =            /
                  GrbObs        =            /
                  GrbHi         =            /
                  Min_GrbVals   =            /
                  Max_GrbVals   =            /
                  Mean_GrbVals  =            /
                  Std_GrbVals   = ;
      /*------------------------------------------------------------*\
      | Minimum Outlier Case                                         |
      \*------------------------------------------------------------*/
      GrbTest='Min';
      GrbCalc    =( Mean_GrbVals - Min_GrbVals) / Std_GrbVals ;
      GrbTStat   =TInv(&alpha/GrbNobs,GrbNobs-2);
      Link GrbCompute;
      * Put       // "Min Grubbs Stats ==>  "  /
                        GrbCrit=               /
                        GrbStat=               /
                        GrbDenom=              /
                        GrbPStat=;
      If GrbPStat < GrbAlpha Then
          GrbLo = GrbLo + 1;
      Else GrbFlag = GrbFlag + .5;
      * put          //  GrbFlag= // I=;
  End;
  Stop;
  ENDIT: Put "Cannot Have missing values for Obsevations";
  Stop;
  /*------------------------------------------------------------*\
  |  Compute Grubbs Statistics                                   |
  \*------------------------------------------------------------*/
  GrbCompute:
      GrbCrit    =((GrbObs-1)/sqrt(GrbObs))*sqrt(GrbTStat**2/(GrbObs-2+GrbTStat**2));
      GrbDenom   =(GrbCalc**2*GrbObs-(GrbObs-1)**2);
        Select;
          When ( GrbDenom =.   ) GoTo ENDIT;
          When ( GrbDenom <  0 ) GrbStat=sqrt(-(GrbCalc**2*GrbObs*(GrbObs-2)/GrbDenom));
          When ( GrbDenom  = 0 ) GrbStat=sqrt(GrbCalc**2*GrbObs*(GrbObs-2)/GrbDenom);
        OtherWise ;
        End;
        If  GrbStat =. then GoTo ENDIT;
        GrbPStat=Min(GrbObs*(1-ProbT(GrbStat,GrbObs-2)),1);
        If GrbPStat < GrbAlpha Then Do;
           If GrbTest='Min' Then GrbDrop=GrbKeys{GrbLo};
           Else GrbDrop=GrbKeys{GrbHi};
           Output;
        End;
  Return;
  /*------------------------------------------------------------*\
  |  Compute MeanStandard Deviation                              |
  \*------------------------------------------------------------*/
  BasicStats:
      GrbObs=( GrbHi - GrbLo + 1 );
      SumSq_GrbVals=0;
      Sum_GrbVals=0;
      Do I = GrbLo To GrbHi;
         SumSq_GrbVals +  GrbVals{I}**2;
         Sum_GrbVals   +  GrbVals{I};
      End;
      Std_GrbVals=Sqrt(
                       (  GrbHi*SumSq_GrbVals - Sum_GrbVals**2 ) /
                       (  GrbObs* ( GrbObs- 1 )                )
                      );
      Max_GrbVals=GrbVals{GrbHi};
      Min_GrbVals=GrbVals{GrbLo};
      Mean_GrbVals= Sum_GrbVals / GrbObs;
  Return;
  Run;


  Proc print Data=utl_Grubbs02  Width=Min;
    Format GrbPStat 6.4;
    Var
      GrbTest
      GrbAlpha
      GrbObs
      GrbDrop
      Min_GrbVals
      Max_GrbVals
      Mean_GrbVals
      Std_GrbVals
      GrbCalc
      GrbCrit
      GrbPStat;
   Run;
  /*------------------------------------------------------------*\
  | Put in same order as original input                          |
  \*------------------------------------------------------------*/
  Proc Sort
       Data = Utl_Grubbs01
       Out  = Utl_Grubbs01(Index= ( Utl_GrbKey / Unique ));
  By   Utl_GrbKey;
  Run;
  /*------------------------------------------------------------*\
  | Tag the Outliers ( Use SQL just to keep skills up )          |
  \*------------------------------------------------------------*/
  Proc Sql;
      Alter  Table  Utl_Grubbs01
      Modify _Outlier_ num;
      Update Utl_Grubbs01
      Set    _Outlier_ =
        (
         Select  1
         From    utl_Grubbs02
         Where   GrbDrop = Utl_GrbKey
         );
   Quit;
   Run;
  /*------------------------------------------------------------*\
  | Concatenate _Outlier_ onto original raw data                 |
  \*------------------------------------------------------------*/
  options mergenoby=nowarn;
  Data &dsn;
     Merge &dsn
           Utl_Grubbs01(Keep=_Outlier_);
    /*------------------------------------------------------------*\
    |  No By Statement ( Data in Exactly the same order )          |
    |  By Variable not on Raw Input Table                          |
    \*------------------------------------------------------------*/
      If _Outlier_ = . then _Outlier_ = 0;
  Run;
  options mergenoby=warn;
  Proc Print
             Data=&dsn(where=(_Outlier_=1));
  run;
%mend Utl_Grubbs;

data bweight;set sashelp.bweight(keep=weight);run;
%utl_Grubbs(work.bweight,weight,alpha=.05);


/* T1001580 Latest VooDoo oto_voodoo check out  dataset
I run the macro oto_voodoo on any new data sent to me

see for more documentation
https://drive.google.com/file/d/0ByX2ii2B0Rq9Y003ZXZFT1pBV1U/view?usp=sharing


I run this on any new data sent to me

I don't think this will run under server EG, best with 'old text editor?'

WHAT THE MACRO DOES

1.   Dataset level summary -- ie number of obs, variable types, static data

2.   Cardinality page  (primary keys, codes/decodes  number of unique
     values for every variable - EG fails here)

3.   Complete frequency for all variables numeric and character with less
     than 200 levels

4.   For variables with over 200 levels top 100 most frequent and bottom
     100 least frequent
     Least frequent are the more interesting cases.

5.   Proc means on all numeric variables

6.   Proc univariate on all numeric variables

7.   Special datetime  analysis

9.   Histograms on all variables with less than 200 levels

10.  Proc contents

11.  Frequency of all numeric variables Missing, negative, zero and positive

12.  Duplicates on single or compound key. Output printed vertically for
     easy comparison

13.  Cross tabs of every variable with every other variable top 16 levels
     (if selectd)

14.  You can also select one variable to cross tab with all other variables
     max top 16 levels

16.  Maximum and minimum lengths to hold all numeric and character variables
     exactly (optimize)

17.  Correlation of all pairs of numeric variables sorted by largest
     correlation to lowest.

18.  Nice display of max and mins for numeric and character in one table

19.  List of identical columns ie date and date1 have equal values on all
     observations

19   One to Many, Many to One, One to Many and Many to Many

20   Cochran-Mantel-Haenszel Statistics

21   Finds missing patterns  (missing pattern frequncies)

22   Printout of first 20, middle 20 and last 20 observations.

23.  Correlation of all pairs of character variables (CMH)
     .

/* for easy editing here are the locations macros un oto_voodoo
prefix area helps
 %macro utlnopts           56
 %macro _vdo_macnam        93
 %macro utlfkil            109
 %macro nobs               158
 %macro nvar               199
 %macro _vdo_cdedec        274
 %macro _vv_annxtb         306
 %macro _vdo_basic         418
 %macro _vdo_optlen        3051
 %macro _vdo_getmaxmin     3144
 %macro _vdo_getmaxmin001  3169
 %macro _vdo_begmidend     3237
 %macro _vdo_clean         3331
 %macro _vdo_chartx        3402
 %macro _vdo_mispop        3606
 %macro _vdo_keyunq        3653
 %macro _vdo_dupcol        3734
 %macro _vdo_cor           3819
 %macro _vdo_mnymny        3882
 %macro _vdo_relhow        3897
 %macro _vdo_cmh           4020
 %macro _vdo_tabone        4116
 %macro _vdo_taball        4181
 %macro _vdo_unqtwo        4261
 %macro utl_getstm         4463
 %macro DirExist           4476
 %macro utlvdoc            4488
 %macro _vdo_unichr        4495



I run the macro 'oto_voodoo' on any new data sent to me

see for more documentation
https://drive.google.com/file/d/0ByX2ii2B0Rq9Y003ZXZFT1pBV1U/view?usp=sharing


I run this on any new data sent to me

I don't think this will run under server EG, best with 'old text editor?'
Compute and memory intensive, runs ok on a power workstation

WHAT THE MACRO DOES

1.   Dataset level summary -- ie number of obs, variable types, static data

2.   Cardinality page  (primary keys, codes/decodes  number of unique
     values for every variable - EG often fails here)

3.   Complete frequency for all variables numeric and character with less
     than 200 levels

4.   For variables with over 200 levels top 100 most frequent and bottom
     100 least frequent
     Least frequent are the more interesting cases.

5.   Proc means on all numeric variables

6.   Proc univariate on all numeric variables

7.   Special datetime  analysis

9.   Histograms on all variables with less than 200 levels

10.  Proc contents

11.  Frequency of all numeric variables Missing, negative, zero and positive

12.  Duplicates on single or compound key. Output printed vertically for
     easy comparison

13.  Cross tabs of every variable with every other variable top 16 levels
     (if selectd)

14.  You can also select one variable to cross tab with all other variables
     max top 16 levels

16.  Maximum and minimum lengths to hold all numeric and character variables
     exactly (optimize)

17.  Correlation of all pairs of numeric variables sorted by largest
     correlation to lowest.

18.  Nice display of max and mins for numeric and character in one table

19.  List of identical columns ie date and date1 have equal values on all
     observations

19   One to Many, Many to One, One to Many and Many to Many

20   Cochran-Mantel-Haenszel Statistics

21   Finds missing patterns  (missing pattern frequncies)

22   Printout of first 20, middle 20 and last 20 observations.

23.  Correlation of all pairs of character variables (CMH)
     .




Super User
Posts: 9,691

Re: How to identify and remove outlier?

1) proc robustreg 

2) Check proc glm's OUTPUT statement ,
especially the COOKD H RESIDUAL .... keywords.



Super Contributor
Posts: 426

Re: How to identify and remove outlier?

I'm a novice user to statistics. Could you please clarify me with one simple example to remove outliers with proc robustreg and proc glm?

 

Thank you for any inputs you offer me.

Super User
Posts: 9,691

Re: How to identify and remove outlier?

For proc robustreg, check the example in documentation.
For proc glm, check COOKD, if it is very large compared to other obs,then this obs should be outlier.




Ask a Question
Discussion stats
  • 6 replies
  • 1407 views
  • 4 likes
  • 5 in conversation