BookmarkSubscribeRSS Feed
Sean_OConnor
Obsidian | Level 7

Folks,

 

I producing some income statistics across different cohorts (regions, age groups, etc.).  When it comes to reporting on values I'm just reporting the median. However, the issue I'm running into is when I start to cross-classify, segment my data into smaller groups, outlier instances are impacting on the median.

 

My code below would work fine if i didn't have the by clause in the proc univariate statement.

 

However, within my dataset I have different values which should adhere to different conditions, depending where they stand in the by clause.

 

Is there a simple way to account for this?

 

proc univariate data = pop;
by  class year_filed age_filed_total;
var income_price_ratiot1;
output out=boxStats_before median=median  qrange=iqr_;
run; 

data _null_;
	set boxStats_before;
	call symput ('median',median);
	call symput ('iqr', iqr);
run; 

%put &median;
%put &iqr;

data trimmed;
set pop;
    if (income_price_ratiot1 le &median + 1.5 * &iqr) and (income_price_ratiot1 ge &median - 1.5 * &iqr); 
run; 
2 REPLIES 2
PaigeMiller
Diamond | Level 26

@Sean_OConnor wrote:

I producing some income statistics across different cohorts (regions, age groups, etc.).  When it comes to reporting on values I'm just reporting the median. However, the issue I'm running into is when I start to cross-classify, segment my data into smaller groups, outlier instances are impacting on the median.


Please show us an example of the data showing this problem of outliers impacting the median. As I understand things, the median should not be affected by outliers, unless there are only two data points in a group, or when half the values in a group are outliers.

--
Paige Miller
ballardw
Super User

@Sean_OConnor wrote:

Folks,

 

I producing some income statistics across different cohorts (regions, age groups, etc.).  When it comes to reporting on values I'm just reporting the median. However, the issue I'm running into is when I start to cross-classify, segment my data into smaller groups, outlier instances are impacting on the median.

 

My code below would work fine if i didn't have the by clause in the proc univariate statement.

 

However, within my dataset I have different values which should adhere to different conditions, depending where they stand in the by clause.

 

Is there a simple way to account for this?

 

proc univariate data = pop;
by  class year_filed age_filed_total;
var income_price_ratiot1;
output out=boxStats_before median=median  qrange=iqr_;
run; 

data _null_;
	set boxStats_before;
	call symput ('median',median);
	call symput ('iqr', iqr);
run; 

%put &median;
%put &iqr;

data trimmed;
set pop;
    if (income_price_ratiot1 le &median + 1.5 * &iqr) and (income_price_ratiot1 ge &median - 1.5 * &iqr); 
run; 

You don't clearly describe exactly what impact you may be seeing.

 

I suggest that for summary statistics in a data set you might consider using proc summary as it will create different summaries that you can select.

Consider the output data set from

proc summary data = pop;
   class  class year_filed age_filed_total;
   var income_price_ratiot1;
   output out=boxStats_before median=median  qrange=iqr_;
run; 

There will be  a variable _type_ in the data set that indicates the level of combinations of the variables on the Class statement.

So there will be an overall summary, class only, year_filed only age_filed_total only, class and year_filed, class and age_filed_total, year_filed and age_filed_total and all three of the variable combinations.

Then you can select which summary you may need for specific uses by selecting on the _type_ value associated with the combination.

 

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 897 views
  • 0 likes
  • 3 in conversation