Solved: Remove detected outliers from dataset

mayasak · Posted 08-09-2023 07:40 PM

I have the following data set and I used the following code to detect outliers using +- 1.5 std as follows

data Exposure;
input Hospital $ Bug $ Drug $ Bug_Drug $ it_n ps_n ns_n;
datalines;
ABC Ecoli MC Ecoli_MC 122 0.5 61
ABC Ecoli MN Ecoli_MN 34 0.5 17
ABC Kleb MN Kleb_MN 200 0.9 180
ABC Kleb MC Kleb_MC 55 0.8 44
ABC Auris MN Auris_MN  143 0.01 1.43
ABC Auris MC Auris_MC  13 0.7 9.1
ABC Auris FL Auris_FL  500 0.7 350
ABC Ecoli FL Ecoli_FL 69 0.1 6.9
ABC Kleb FL Kleb_FL 113 0.4 45.2
XYZ Kleb MN Kleb_MN 100 0.6 60
XYZ Ecoli MC Ecoli_MC 233 1 233
XYZ Kleb MC Kleb_MC 33 1 33
XYZ Ecoli FL Ecoli_FL 112 1 112
XYZ Kleb FL Kleb_FL 100 0.8 80
XYZ Ecoli MN Ecoli_MN 212 1 212
XYZ Auris MN Auris_MN 78 0.9 70.2
XYZ Auris FL Auris_FL 23 1 23
RTY Kleb MN Kleb_Mn 50 0.6 30
RTY Ecoli MC Ecoli_MC 230 0.1 23
RTY Kleb MC Kleb_MC 440 0.8 352
RTY Kleb FL Kleb_FL 56 0.8 44.8
RTY Ecoli MN Ecoli_MN 20 0.9 18
RTY Ecoli FL Ecoli_FL 40 0.5 20
RTY Auris FL Auris_FL 29 0.8 23.2
RTY Auris MN Auris_MN 88 0.9 79.2
RTY Auris MC Auris_MC 90 0.1 9
HOW Kleb MN Kleb_Mn 50 0.4 20
HOW Ecoli MC Ecoli_MC 90 0.8 72
HOW Kleb MC Kleb_MC 66 0.1 6.6 
HOW Kleb FL Kleb_FL 70 0.1 7
HOW Ecoli MN Ecoli_MN 389 0.3 116.7
HOW Ecoli FL Ecoli_FL 120 0.7 84
HOW Auris FL Auris_FL 35 1 35
HOW Auris MN Auris_MN 99 0.2 19.8
HOW Auris MC Auris_MC 20 0.1 2
CVS Kleb MN Kleb_Mn 50 0.4 20
CVS Ecoli MC Ecoli_MC 312 0.4 124.8
CVS Kleb MC Kleb_MC 44 0.6 26.4 
CVS Kleb FL Kleb_FL 300 0.5 150
CVS Ecoli MN Ecoli_MN 60 0.4 24
CVS Ecoli FL Ecoli_FL 100 0.7 70
CVS Auris FL Auris_FL 78 0.1 7.8
CVS Auris MN Auris_MN 344 0.2 68.8
CVS Auris MC Auris_MC 789 0.6 473.4
;;;
run;
proc sort data= Exposure;
  by bug_drug;
run;
proc univariate data=Exposure;
  by bug_drug;
  var ps_n;
  histogram;
  output out=means mean=ps_mean std=ps_std; 
run;
data MDRO_Report_2021;
  merge Exposure means;
  by bug_drug;
run;
proc print data=MDRO_Report_2021 noobs;
  where abs(ps_n-ps_mean) > 1.5*ps_std ;
  by bug_drug;
  var hospital drug bug bug_drug it_n ps_n ns_n;
run;

Running this code, the outliers were detected but I need a way to remove them from the dataset instead of deleting them one by one from the original dataset.

Thank you.

Patrick · Posted 08-09-2023 07:54 PM

Does below return what you're after?

data want;
  merge Exposure means;
  by bug_drug;
  if abs(ps_n-ps_mean) > 1.5*ps_std then delete;
  drop ps_mean ps_std;
run;

View solution in original post

Patrick · Posted 08-09-2023 07:54 PM

Does below return what you're after?

data want;
  merge Exposure means;
  by bug_drug;
  if abs(ps_n-ps_mean) > 1.5*ps_std then delete;
  drop ps_mean ps_std;
run;

mayasak · Posted 08-09-2023 08:30 PM

Exactly. Thank you

Remove detected outliers from dataset

Re: Remove detected outliers from dataset

Re: Remove detected outliers from dataset

Re: Remove detected outliers from dataset

Remove detected outliers from dataset

Re: Remove detected outliers from dataset

Re: Remove detected outliers from dataset

Re: Remove detected outliers from dataset

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away