I have the following data set and I used the following code to detect outliers using +- 1.5 std as follows
data Exposure;
input Hospital $ Bug $ Drug $ Bug_Drug $ it_n ps_n ns_n;
datalines;
ABC Ecoli MC Ecoli_MC 122 0.5 61
ABC Ecoli MN Ecoli_MN 34 0.5 17
ABC Kleb MN Kleb_MN 200 0.9 180
ABC Kleb MC Kleb_MC 55 0.8 44
ABC Auris MN Auris_MN 143 0.01 1.43
ABC Auris MC Auris_MC 13 0.7 9.1
ABC Auris FL Auris_FL 500 0.7 350
ABC Ecoli FL Ecoli_FL 69 0.1 6.9
ABC Kleb FL Kleb_FL 113 0.4 45.2
XYZ Kleb MN Kleb_MN 100 0.6 60
XYZ Ecoli MC Ecoli_MC 233 1 233
XYZ Kleb MC Kleb_MC 33 1 33
XYZ Ecoli FL Ecoli_FL 112 1 112
XYZ Kleb FL Kleb_FL 100 0.8 80
XYZ Ecoli MN Ecoli_MN 212 1 212
XYZ Auris MN Auris_MN 78 0.9 70.2
XYZ Auris FL Auris_FL 23 1 23
RTY Kleb MN Kleb_Mn 50 0.6 30
RTY Ecoli MC Ecoli_MC 230 0.1 23
RTY Kleb MC Kleb_MC 440 0.8 352
RTY Kleb FL Kleb_FL 56 0.8 44.8
RTY Ecoli MN Ecoli_MN 20 0.9 18
RTY Ecoli FL Ecoli_FL 40 0.5 20
RTY Auris FL Auris_FL 29 0.8 23.2
RTY Auris MN Auris_MN 88 0.9 79.2
RTY Auris MC Auris_MC 90 0.1 9
HOW Kleb MN Kleb_Mn 50 0.4 20
HOW Ecoli MC Ecoli_MC 90 0.8 72
HOW Kleb MC Kleb_MC 66 0.1 6.6
HOW Kleb FL Kleb_FL 70 0.1 7
HOW Ecoli MN Ecoli_MN 389 0.3 116.7
HOW Ecoli FL Ecoli_FL 120 0.7 84
HOW Auris FL Auris_FL 35 1 35
HOW Auris MN Auris_MN 99 0.2 19.8
HOW Auris MC Auris_MC 20 0.1 2
CVS Kleb MN Kleb_Mn 50 0.4 20
CVS Ecoli MC Ecoli_MC 312 0.4 124.8
CVS Kleb MC Kleb_MC 44 0.6 26.4
CVS Kleb FL Kleb_FL 300 0.5 150
CVS Ecoli MN Ecoli_MN 60 0.4 24
CVS Ecoli FL Ecoli_FL 100 0.7 70
CVS Auris FL Auris_FL 78 0.1 7.8
CVS Auris MN Auris_MN 344 0.2 68.8
CVS Auris MC Auris_MC 789 0.6 473.4
;;;
run;
proc sort data= Exposure;
by bug_drug;
run;
proc univariate data=Exposure;
by bug_drug;
var ps_n;
histogram;
output out=means mean=ps_mean std=ps_std;
run;
data MDRO_Report_2021;
merge Exposure means;
by bug_drug;
run;
proc print data=MDRO_Report_2021 noobs;
where abs(ps_n-ps_mean) > 1.5*ps_std ;
by bug_drug;
var hospital drug bug bug_drug it_n ps_n ns_n;
run;
Running this code, the outliers were detected but I need a way to remove them from the dataset instead of deleting them one by one from the original dataset.
Thank you.
Does below return what you're after?
data want;
merge Exposure means;
by bug_drug;
if abs(ps_n-ps_mean) > 1.5*ps_std then delete;
drop ps_mean ps_std;
run;
Does below return what you're after?
data want;
merge Exposure means;
by bug_drug;
if abs(ps_n-ps_mean) > 1.5*ps_std then delete;
drop ps_mean ps_std;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.