I need to calculate outlier (2 standard deviation). To get the desire output, I want to calculate mean and standard deviation by using the group and condition.
If condition match, I want to assign that value for group (or I want to use that value to calculate lower and upper limit)
Lower_limit = mean - 2 * sd
Upper_limit = mean + 2 * sd
If var2 is less than lower limit or greater than upper limit than I want to delete that observation from my dataset.
This is what I want:
var1
var1 | var2 | mean | sd | lower_limit | upper_limit |
b | 1 | 3.4 | 3.97 | -4.54 | 11.34 |
b | 10 | 3.4 | 3.97 | -4.54 | 11.34 |
b | 2 | 3.4 | 3.97 | -4.54 | 11.34 |
b | 4 | 3.4 | 3.97 | -4.54 | 11.34 |
b | 0 | 3.4 | 3.97 | -4.54 | 11.34 |
c | 3 | . | . | . | . |
c | 3 | . | . | . | . |
c | 2 | . | . | . | . |
c | 1 | . | . | . | . |
c | 1 | . | . | . | . |
In this example, I do not have any demand less than lower limit or greater than upper limit. So I need to keep all observation.
data have;
input var1 var2;
datalines;
b 1
b 10
b 2
b 4
b 0
c 3
c 3
c 2
c 1
c 1
;
run;
proc sql;
create table want as
select*, avg(var2) as mean format 10.3, std(var2) as sd format 10.3
from have
group by var1
having var2 > 4;
quit
*cannot get the desire (same) value by group.
*delete observations based on lower and upper limit creteria.
data want;
set want;
lower_limit = mean - (2*sd);
upper_limit = mean + (2*sd);
if demand = . then output;
if demand < lower_limit then delete;
if demand > upper_limit then delete;
run;
What is your question?
How to calculate mean and standard deviation based on the condition by group?
data have;
input group $ var2;
datalines;
b 1
b 10
b 2
b 4
b 0
c 3
c 3
c 2
c 1
c 1
;
proc summary nway data=have;
class group;
var var2;
output out=_stats_ mean=mean stddev=sd;
run;
data want;
if _n_=1 then set _stats_(drop=_:);
set have;
/* If you want the confidence limits, you type the formula here */
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.