Hello,
I am trying to identifyi outliers within my dataset with multiple by variables. Please see an example of the data below:
ID | Measure | Date | Numerator | Denominator |
1 | A | 1-Jan | 1 | 0 |
1 | A | 2-Jan | 50 | 1 |
1 | A | 3-Jan | 2 | 80 |
1 | B | 1-Jan | 1 | 1 |
1 | B | 2-Jan | 50 | 50 |
1 | B | 3-Jan | 2 | 2 |
2 | A | 1-Jan | 1 | 1 |
2 | A | 2-Jan | 2 | 2 |
2 | B | 1-Jan | 1 | 1 |
2 | B | 2-Jan | 2 | 0 |
3 | A | 1-Jan | 1 | 1 |
3 | A | 2-Jan | 2 | 0 |
3 | B | 1-Jan | 1 | 2 |
3 | B | 2-Jan | 50 | 3 |
3 | C | 1-Jan | 2 | 1 |
3 | C | 2-Jan | 2 | 1 |
I'm trying to identify the outliers with the Numerators and Denominators by ID and Measure. So far, I have the following code, but it's not producing the desired results.
Also, is there a way to create a separate table with the 'n median qrange p25 p75' by ID and Measure?
Any assistance would be greatly appreciated. Thank you!
proc MEANS Data=have
n median qrange p25 p75;
var Numerator;
class ID Measure;
ods output summary=ranges;
run;
data Out;
set have;
Outlier = IFC(Numerator > (Numerator*3), 'Y','N');
run;