Hi All,,
I'm new to SAS and my task is to calculate shares of outliers for my date. I have already prepared the following code but stopped after joining table.
Creating descriptive statistics
title 'Descriptive statistics for each variable from "valid" dataset';
ods exclude _all_;
proc means data=valid
n min p1 p5 p10 q1 p50 q3 p90 p95 p99 max mean std qrange range
STACKODSOUTPUT;
var _numeric_ ;
ods output summary=summary_valid;
run;
ods exclude none;
Creating a table with outlier bounds data
proc sql;
create table outlier_valid as
select variable, Q1-1.5*QRANGE as lb_mid, Q3+1.5*QRANGE as ub_mid, Q1-3*QRANGE as lb_extr, Q3+3*QRANGE as ub_extr
from summary_valid
order by variable desc, variable;
quit;
proc print data=outlier_valid noobs;
title1 'The summary of outliers data valid';
title2 'lb_mid = lower bound for mid outliers';
title3 'ub_mid = upper bound for mid outliers';
title4 'lb_extr = lower bound for extreme outliers';
title5 'ub_extr = upper bound for extreme outliers';
run;
Joining two above tables
proc sql;
create table train_join as
select s.variable, s.n, s.MIN, s.P1, s.P5, s.P10, s.Q1, s.P50, S.Q3, s.P90, s.P95, s.P99, s.MAX, s.MEAN, s.STDDEV, s.QRANGE, s.RANGE, o.lb_mid, o.ub_mid, o.lb_extr, o.ub_extr
from summary_train s, outlier_train o
where s.variable = o.variable;
quit;
Now I want to create a variable representing count of outliers which:
are lower than lb_mid
are higher than up_mid
are lower than lb_extr
are higher than up_extr
I would also like to create variable mild_outliers (true/false) and extrr_outliers (true/false)
Could you please give me some tips?
Kind regards,
Cezary