Calcite | Level 5

## Taking difference of Median for 2 different datasets

Hi,

I have a financial dataset which is split by 2018Q1 and 2019Q1 and I need to find the difference of Median between two datasets. I can use proc means and by YearQtr however it brings PROC MEANS results for 2018Q1 and 2019Q1 and I have to do manually difference between two datasets. I am looking for a Macro that basically takes PROC Means results and put them in Array and then take difference and spit out MEDIAN for first dataset (i.e. 2018Q1), MEDIAN for second dataset (i.e. 2019Q1) and associated difference.

7 REPLIES 7
SAS Super FREQ

## Re: Taking difference of Median for 2 different datasets

I suggest you concatenate the data into a single data set that has an indicator variable Quarter that has values "2018Q1" and "2019Q1".

You can then use the CLASS statement to analyze both quarters in the same procedure call. For example, the following gives medians for males and females in the same data set:

``````proc means data=sashelp.class median;
class sex;
var height;
output out=Medians median=;
run;

data diff;
set Medians;
where _TYPE_ = 1;
diff = dif(Height);
run;

proc print data=diff;
run;``````

If you are a statistician, this approach also permits more sophisticated analyses. For example, instead of merely computing the difference between the group medians, you can ask whether the difference is statistically significant and get confidence intervals for the difference. This might be more than you need, but if you are interested, read about "The difference of medians in SAS"

``````proc quantreg data=Sashelp.Class;
class sex;
model height = sex / quantile=0.5;
estimate 'Diff in Medians' sex 1 -1 / CL;
run;``````

Calcite | Level 5

## Re: Taking difference of Median for 2 different datasets

You brought a very good point. Actually I am looking for calculating confidence level between variables in datasets. I do have joined the data so my dataset does contain Year & Quarter together. In order to find the overlap of Mean in two datasets, I was thinking to take difference automatically thru code instead of visually looking at it. Also, I have more than 15 variables to compare for Null Hypothesis & Statistically signficance. What do you recommend regarding checking confidence level and whether Null Hypothesis is correct or not?
SAS Super FREQ

## Re: Taking difference of Median for 2 different datasets

The article I linked to has a discussion. It also provides links to the doc for PROC NPAR1WAY, which contains many tests for medians, including the WILCOXON option requests the Wilcoxon test with the MEDIAN option, which requests the median test for difference in location.

Be sure to read the section of the NPAR1WAY doc about multiple comparisons and the DSCF option. You need to adjust for multiple comparisons if you have 15 class levels.

Fluorite | Level 6

## Re: Taking difference of Median for 2 different datasets

would you characterize proc quantreg as "median difference was determined by using quantile regression"? Would you expect to see similar significance values as mannU? Is there much a difference between the two?

SAS Super FREQ

## Re: Taking difference of Median for 2 different datasets

> would you characterize proc quantreg as "median difference was determined by using quantile regression"?

Yes.

> Would you expect to see similar significance values as mannU?

In many cases, yes. But for some data, one MannU might be significant and QUANTREG be no significant, or vice versa. For an example that uses normality tests, see "How often do different statistical tests agree?"

> Is there much a difference between the two?

Mathematically? Yes. In terms of testing the median difference, both should be usable in practice and help guide to you make valid inferences.

Is there much a difference between the two?

Fluorite | Level 6

## Re: Taking difference of Median for 2 different datasets

Thank you for taking time to reply. I just posted a new question regrading this topic, with my data, code, and results. If you do have a chance, I would appreciate your thoughts.

https://communities.sas.com/t5/Statistical-Procedures/Quantile-Regression-and-MannU-test-for-median-...

Super User

## Re: Taking difference of Median for 2 different datasets

You need to provide more details but taking the difference is a trivial operation that may not need a macro.

• What makes you think you need a macro? Most of the time it's simpler to not use macros, they're harder to debug and maintain over time.
• How many variables are you dealing with?If one variable this is easy, if for 200, slightly different approach needed.
• Do the variables have any naming conventions? This can make it easier to code.
• How many observations? Larger data may need a different approach for efficiency.
• Do you need to do this dynamically in the future, ie auto calculate the latest quarter and difference those?
• What version of SAS are you using? In SAS 9.4+ the Median function is valid in SQL, it's not available in prior versions.

@Flyer2345 wrote:

Hi,

I have a financial dataset which is split by 2018Q1 and 2019Q1 and I need to find the difference of Median between two datasets. I can use proc means and by YearQtr however it brings PROC MEANS results for 2018Q1 and 2019Q1 and I have to do manually difference between two datasets. I am looking for a Macro that basically takes PROC Means results and put them in Array and then take difference and spit out MEDIAN for first dataset (i.e. 2018Q1), MEDIAN for second dataset (i.e. 2019Q1) and associated difference.