BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JKCho
Pyrite | Level 9

3.PNG

 

I obtained the data like the above. What I am trying to do is to remove the top/bottom 1 percentile for each quarter in each year. 

I have tried to find in this community but could not find any about removing outliers in sub-categories. I guess I can specify a variable to be removed (gaap and non-g in this case) and can also specify subgroups (QTR and year). 

 

Thank you for your help!! 🙂

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

To identify the TOP and BOTTOM 1-percentile of each year and quarter, use PROC RANK.

 

proc rank data=have out=ranked groups=100;
    by year qtr;
    var gaap non_g;
    ranks r_gaap r_non_g;
run;

This creates 100 groups in variables r_gaap and r_non_g in data set RANKED, each with numbers 0 to 99 within each YEAR and QTR combination, where 0 is the 1-st percentile. From there, a simple data step with a WHERE statement will remove the bottom 1-percentile and top 1-percentile within each year and qtr.

 

Better yet (in my opinion) instead of removing observations: turn the lowest 1-percentile and highest 1-percentile into missing values, as removing observations will be problems if for an observation gaap is lowest 1-percentile but non_g is not lowest 1-percentile.

--
Paige Miller

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

To identify the TOP and BOTTOM 1-percentile of each year and quarter, use PROC RANK.

 

proc rank data=have out=ranked groups=100;
    by year qtr;
    var gaap non_g;
    ranks r_gaap r_non_g;
run;

This creates 100 groups in variables r_gaap and r_non_g in data set RANKED, each with numbers 0 to 99 within each YEAR and QTR combination, where 0 is the 1-st percentile. From there, a simple data step with a WHERE statement will remove the bottom 1-percentile and top 1-percentile within each year and qtr.

 

Better yet (in my opinion) instead of removing observations: turn the lowest 1-percentile and highest 1-percentile into missing values, as removing observations will be problems if for an observation gaap is lowest 1-percentile but non_g is not lowest 1-percentile.

--
Paige Miller
JKCho
Pyrite | Level 9
oh, this way is gorgeous!
I just come up with
data=one set=have;
if year=2000; qtr=1;
run;

and making something like the above 80 times... you saved me! thank you 🙂
PaigeMiller
Diamond | Level 26

@JKCho wrote:
oh, this way is gorgeous!
I just come up with
data=one set=have;
if year=2000; qtr=1;
run;

and making something like the above 80 times... you saved me! thank you 🙂

@JKCho that's ridiculous to do it 80 times. One DATA step does it for all combinations of YEAR and QTR.

--
Paige Miller

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1771 views
  • 1 like
  • 2 in conversation