BookmarkSubscribeRSS Feed
Sean_OConnor
Fluorite | Level 6

Folks,

 

I producing some income statistics across different cohorts (regions, age groups, etc.).  When it comes to reporting on values I'm just reporting the median. However, the issue I'm running into is when I start to cross-classify, segment my data into smaller groups, outlier instances are impacting on the median.

 

My code below would work fine if i didn't have the by clause in the proc univariate statement.

 

However, within my dataset I have different values which should adhere to different conditions, depending where they stand in the by clause.

 

Is there a simple way to account for this?

 

proc univariate data = pop;
by  class year_filed age_filed_total;
var income_price_ratiot1;
output out=boxStats_before median=median  qrange=iqr_;
run; 

data _null_;
	set boxStats_before;
	call symput ('median',median);
	call symput ('iqr', iqr);
run; 

%put &median;
%put &iqr;

data trimmed;
set pop;
    if (income_price_ratiot1 le &median + 1.5 * &iqr) and (income_price_ratiot1 ge &median - 1.5 * &iqr); 
run; 
2 REPLIES 2
PaigeMiller
Diamond | Level 26

@Sean_OConnor wrote:

I producing some income statistics across different cohorts (regions, age groups, etc.).  When it comes to reporting on values I'm just reporting the median. However, the issue I'm running into is when I start to cross-classify, segment my data into smaller groups, outlier instances are impacting on the median.


Please show us an example of the data showing this problem of outliers impacting the median. As I understand things, the median should not be affected by outliers, unless there are only two data points in a group, or when half the values in a group are outliers.

--
Paige Miller
ballardw
Super User

@Sean_OConnor wrote:

Folks,

 

I producing some income statistics across different cohorts (regions, age groups, etc.).  When it comes to reporting on values I'm just reporting the median. However, the issue I'm running into is when I start to cross-classify, segment my data into smaller groups, outlier instances are impacting on the median.

 

My code below would work fine if i didn't have the by clause in the proc univariate statement.

 

However, within my dataset I have different values which should adhere to different conditions, depending where they stand in the by clause.

 

Is there a simple way to account for this?

 

proc univariate data = pop;
by  class year_filed age_filed_total;
var income_price_ratiot1;
output out=boxStats_before median=median  qrange=iqr_;
run; 

data _null_;
	set boxStats_before;
	call symput ('median',median);
	call symput ('iqr', iqr);
run; 

%put &median;
%put &iqr;

data trimmed;
set pop;
    if (income_price_ratiot1 le &median + 1.5 * &iqr) and (income_price_ratiot1 ge &median - 1.5 * &iqr); 
run; 

You don't clearly describe exactly what impact you may be seeing.

 

I suggest that for summary statistics in a data set you might consider using proc summary as it will create different summaries that you can select.

Consider the output data set from

proc summary data = pop;
   class  class year_filed age_filed_total;
   var income_price_ratiot1;
   output out=boxStats_before median=median  qrange=iqr_;
run; 

There will be  a variable _type_ in the data set that indicates the level of combinations of the variables on the Class statement.

So there will be an overall summary, class only, year_filed only age_filed_total only, class and year_filed, class and age_filed_total, year_filed and age_filed_total and all three of the variable combinations.

Then you can select which summary you may need for specific uses by selecting on the _type_ value associated with the combination.

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 429 views
  • 0 likes
  • 3 in conversation