- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I obtained the data like the above. What I am trying to do is to remove the top/bottom 1 percentile for each quarter in each year.
I have tried to find in this community but could not find any about removing outliers in sub-categories. I guess I can specify a variable to be removed (gaap and non-g in this case) and can also specify subgroups (QTR and year).
Thank you for your help!! 🙂
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To identify the TOP and BOTTOM 1-percentile of each year and quarter, use PROC RANK.
proc rank data=have out=ranked groups=100;
by year qtr;
var gaap non_g;
ranks r_gaap r_non_g;
run;
This creates 100 groups in variables r_gaap and r_non_g in data set RANKED, each with numbers 0 to 99 within each YEAR and QTR combination, where 0 is the 1-st percentile. From there, a simple data step with a WHERE statement will remove the bottom 1-percentile and top 1-percentile within each year and qtr.
Better yet (in my opinion) instead of removing observations: turn the lowest 1-percentile and highest 1-percentile into missing values, as removing observations will be problems if for an observation gaap is lowest 1-percentile but non_g is not lowest 1-percentile.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To identify the TOP and BOTTOM 1-percentile of each year and quarter, use PROC RANK.
proc rank data=have out=ranked groups=100;
by year qtr;
var gaap non_g;
ranks r_gaap r_non_g;
run;
This creates 100 groups in variables r_gaap and r_non_g in data set RANKED, each with numbers 0 to 99 within each YEAR and QTR combination, where 0 is the 1-st percentile. From there, a simple data step with a WHERE statement will remove the bottom 1-percentile and top 1-percentile within each year and qtr.
Better yet (in my opinion) instead of removing observations: turn the lowest 1-percentile and highest 1-percentile into missing values, as removing observations will be problems if for an observation gaap is lowest 1-percentile but non_g is not lowest 1-percentile.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I just come up with
data=one set=have;
if year=2000; qtr=1;
run;
and making something like the above 80 times... you saved me! thank you 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content