Calcite | Level 5

## proc univariate, median's option

Hello,

look at this basic dataset

Var1 Var2

A 2

D 3

C 4

I'm using the proc univariate in order to calculate the median value of Var2, which is not so difficult so far. But I should save in the output dataset the value of Var1 too (in this case "D"). Is there an option in the proc univariate to do that?

Thank you

7 REPLIES 7
Obsidian | Level 7

## Re: proc univariate, median's option

But the median can sometimes be between 2 values (e.g. 1,2,3,4 = 2.5), or there could be multiple rows that have the median value (e.g. 1,2,2,2,3 = 2).  How would you want to treat those?

Calcite | Level 5

## Re: proc univariate, median's option

Hi Keith,

I was just thinking the same. In any case, for my purposes, multiple values do not affect the goodness of the results. I could choose one of them.

However I think that proc univariate does not have this kind of option. Does it?

Should that be the case, I imagine that I'd have to remerge my results with the original dataset. Am I right?

Thanks again

Obsidian | Level 7

## Re: proc univariate, median's option

I don't believe this option does exist, so your suggestion is one way to go, although you'll have to deal with the situations I described.  Another method would be to sort the data by Var2, then loop through until Var2 >= median and output that observation.

Calcite | Level 5

## Re: proc univariate, median's option

Ok, I think i''ll go with the merge solution. I was looking for an option, but it's clear that I have to do some additional work...

Super User

## Re: proc univariate, median's option

The median may not be a value in your dataset, or it may be multiple values. Something to consider.

## Re: proc univariate, median's option

That brings to mind --what about using PROC RANK?  Then just select the record with rank=floor(N/2) + 1, where N is the number of observations if odd, and select both records floor(N/2) and floor(N/2) + 1 if N is even, and take the mean of those two values.  I'm sure there is a fairly straightforward way to program this in a data step after ranking the values.

Steve Denham

SAS Super FREQ

## Re: proc univariate, median's option

Generalizing Steve's suggestion, why not just sort and then print out the floor(N/2)+1 observation, like this:

data _NULL_;

if 0 then set sashelp.class nobs=n;

call symputx('MedIndex',floor(n/2)+1);

stop;

run;

proc sort data=sashelp.class out=class;

by age;

run;

proc print data=class(firstobs=&MedIndex obs=&MedIndex);

run;

Discussion stats
• 7 replies
• 1296 views
• 0 likes
• 5 in conversation