I need to calculate outlier (2 standard deviation). To get the desire output, I want to calculate mean and standard deviation by using the group and condition.
If condition match, I want to assign that value for group (or I want to use that value to calculate lower and upper limit)
Lower_limit = mean - 2 * sd
Upper_limit = mean + 2 * sd
If var2 is less than lower limit or greater than upper limit than I want to delete that observation from my dataset.
This is what I want:
var1
var1 | var2 | mean | sd | lower_limit | upper_limit |
b | 1 | 3.4 | 3.97 | -4.54 | 11.34 |
b | 10 | 3.4 | 3.97 | -4.54 | 11.34 |
b | 2 | 3.4 | 3.97 | -4.54 | 11.34 |
b | 4 | 3.4 | 3.97 | -4.54 | 11.34 |
b | 0 | 3.4 | 3.97 | -4.54 | 11.34 |
c | 3 | . | . | . | . |
c | 3 | . | . | . | . |
c | 2 | . | . | . | . |
c | 1 | . | . | . | . |
c | 1 | . | . | . | . |
In this example, I do not have any demand less than lower limit or greater than upper limit. So I need to keep all observation.
data have;
input var1 var2;
datalines;
b 1
b 10
b 2
b 4
b 0
c 3
c 3
c 2
c 1
c 1
;
run;
proc sql;
create table want as
select*, avg(var2) as mean format 10.3, std(var2) as sd format 10.3
from have
group by var1
having var2 > 4;
quit
*cannot get the desire (same) value by group.
*delete observations based on lower and upper limit creteria.
data want;
set want;
lower_limit = mean - (2*sd);
upper_limit = mean + (2*sd);
if demand = . then output;
if demand < lower_limit then delete;
if demand > upper_limit then delete;
run;
What is your question?
How to calculate mean and standard deviation based on the condition by group?
data have;
input group $ var2;
datalines;
b 1
b 10
b 2
b 4
b 0
c 3
c 3
c 2
c 1
c 1
;
proc summary nway data=have;
class group;
var var2;
output out=_stats_ mean=mean stddev=sd;
run;
data want;
if _n_=1 then set _stats_(drop=_:);
set have;
/* If you want the confidence limits, you type the formula here */
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.