Solved: how to remove outliers in each sub group?

JKCho · Posted 04-03-2022 12:22 PM

I obtained the data like the above. What I am trying to do is to remove the top/bottom 1 percentile for each quarter in each year.

I have tried to find in this community but could not find any about removing outliers in sub-categories. I guess I can specify a variable to be removed (gaap and non-g in this case) and can also specify subgroups (QTR and year).

Thank you for your help!! 🙂

PaigeMiller · Posted 04-03-2022 01:31 PM

To identify the TOP and BOTTOM 1-percentile of each year and quarter, use PROC RANK.

proc rank data=have out=ranked groups=100;
    by year qtr;
    var gaap non_g;
    ranks r_gaap r_non_g;
run;

This creates 100 groups in variables r_gaap and r_non_g in data set RANKED, each with numbers 0 to 99 within each YEAR and QTR combination, where 0 is the 1-st percentile. From there, a simple data step with a WHERE statement will remove the bottom 1-percentile and top 1-percentile within each year and qtr.

Better yet (in my opinion) instead of removing observations: turn the lowest 1-percentile and highest 1-percentile into missing values, as removing observations will be problems if for an observation gaap is lowest 1-percentile but non_g is not lowest 1-percentile.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 04-03-2022 01:31 PM

To identify the TOP and BOTTOM 1-percentile of each year and quarter, use PROC RANK.

proc rank data=have out=ranked groups=100;
    by year qtr;
    var gaap non_g;
    ranks r_gaap r_non_g;
run;

This creates 100 groups in variables r_gaap and r_non_g in data set RANKED, each with numbers 0 to 99 within each YEAR and QTR combination, where 0 is the 1-st percentile. From there, a simple data step with a WHERE statement will remove the bottom 1-percentile and top 1-percentile within each year and qtr.

Better yet (in my opinion) instead of removing observations: turn the lowest 1-percentile and highest 1-percentile into missing values, as removing observations will be problems if for an observation gaap is lowest 1-percentile but non_g is not lowest 1-percentile.

--
Paige Miller

JKCho · Posted 04-04-2022 07:52 PM

oh, this way is gorgeous!
I just come up with
data=one set=have;
if year=2000; qtr=1;
run;

and making something like the above 80 times... you saved me! thank you 🙂

PaigeMiller · Posted 04-05-2022 05:50 AM

@JKCho wrote:
oh, this way is gorgeous!
I just come up with
data=one set=have;
if year=2000; qtr=1;
run;

and making something like the above 80 times... you saved me! thank you 🙂

@JKCho that's ridiculous to do it 80 times. One DATA step does it for all combinations of YEAR and QTR.

--
Paige Miller

how to remove outliers in each sub group?

Re: how to remove outliers in each sub group?

Re: how to remove outliers in each sub group?

Re: how to remove outliers in each sub group?

Re: how to remove outliers in each sub group?

how to remove outliers in each sub group?

Re: how to remove outliers in each sub group?

Re: how to remove outliers in each sub group?

Re: how to remove outliers in each sub group?

Re: how to remove outliers in each sub group?

Registration is open

SAS Training: Just a Click Away