## Finding the median of a variable in a data step

# Finding the median of a variable in a data step

I have an undefined number of observations sorted by a var1 and I'm looking to find the median of var1 of these observations in a data step (without using proc means). I know I can add a variable for _n_ which will give me the observation number. But then, I'm wondering how I can get the middle observation (if the number of observations is odd) or average the two middle observations (if the number of observations is even).

Any help on this would be appreciated.

## Re: Finding the median of a variable in a data step

Why would you want to do this since a number of procs already have the ability to do all of the work?

data have;

input var1-var10 col1-col11;

cards;

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

;

data want (keep=variable);

set have;

array vars _all_;

do over vars;

variable=vars;

output;

end;

run;

data median (keep=median);

set want end=lastrec nobs=numobs;

retain low high;

if numobs/2-int(numobs/2) and _n_ eq ceil(numobs/2) then do;

low=variable;

high=variable;

end;

else if _n_ eq int(numobs/2) then low=variable;

else if _n_ eq int(numobs/2)+1 then high=variable;

if lastrec then do;

median=sum(low,high)/2;

output;

end;

run;

## Re: Finding the median of a variable in a data step

Another way :

data have;
input var1 @@;
datalines;
4 7 2 6 32 4 5 8 3 7
;

data _null_;
set have nobs=n;
call symput ("firstobs", ceil(n/2));
call symput ("obs", ceil(n/2)+(mod(n,2)=0));
stop;
run;

proc sort data=have; by var1; run;

data want(keep=median);
set have (firstobs=&firstobs. obs=&firstobs.);
varx = var1;
set have (firstobs=&obs. obs=&obs.);
median = mean(var1,varx);
run;

## Re: Finding the median of a variable in a data step

PGStats,

I know you know this, but it looks like you're burning a little too much midnight oil.  Switch the CALL SYMPUTs to:

call symputx("firstobs", floor(n/2));

call symputx("obs", ceil(n/2));

The tools are good, the details are tricky.  Also, can the formula for medians get complex if there can be ties?

## Re: Finding the median of a variable in a data step

: Ties are irrelevant.  The definition can be found at: Median - Wikipedia, the free encyclopedia

## Re: Finding the median of a variable in a data step

Art,

I'm focusing on the word "usually" at the end of the first paragraph of your link.

Also, on the PCTLDEF option within PROC UNIVARIATE.

You might be right on this, but I'm just not sure yet.

## Re: Finding the median of a variable in a data step

: If one uses one of the sas procs or function for getting the median that definition is the one that is used.  See, e.g.,

Base SAS(R) 9.2 Procedures Guide

## Re: Finding the median of a variable in a data step

If N=5, I want firstobs=3 and obs=3. If N=4, I want firstobs=2 and obs=3. Hence the expressions I proposed.

Ties are not a problem. Empty datasets, are, however.

## Re: Finding the median of a variable in a data step

PGStats,

You're right, my bad.  Where's that coffee?

## Re: Finding the median of a variable in a data step

If the data are sorted you can use direct access to find the one or two obs needed for median.

proc sort data=sashelp.class(obs=16) out=class;
by age;
run;
data want;
x = nobs/2;

if mod(nobs,2) eq 0 then do;

do point = x,x+1;

s + age;

end;
median = s/
2;

end;

else do point=ceil(x);
median = age;

end;

output;

stop;

return;
set:
set class nobs=nobs point=point;
return;

keep age ;
format age 8.2;

run;
## Re: Finding the median of a variable in a data step

Nicely done DN. It can be further simplified as :

proc sort data=sashelp.class(obs=16) out=class;

by age;

run;

data want;

if mod(nobs, 2) then do point=(1+nobs)/2;

set class point=point;

median = age;

end;

else do point = nobs/2, 1+nobs/2;

set class nobs=nobs point=point;

median + age/2;

end;

age = median;

output; stop;

keep age ;

format age 8.2;

run;

## Re: Finding the median of a variable in a data step

Or if you like compactness :

proc sort data=sashelp.class(obs=16) out=class;

by age;

run;

data want;

do point=(mod(nobs, 2)+nobs)/2, (2-mod(nobs, 2)+nobs)/2 ;

set class nobs=nobs point=point;

median + age/2;

end;

age = median;

output; stop;

keep age ;

format age 8.2;

run;

## Re: Finding the median of a variable in a data step

I am so smart that I have deleted my stupid post this morning:smileylaugh:.

## Re: Finding the median of a variable in a data step

Maybe I should have done the same.  :smileyshocked:

To atone for my earlier fog, here's my less foggy version of the looping on this one:

do point = ceil(nobs/2), ceil( (nobs+1)/2 );

## Re: Finding the median of a variable in a data step

NIce! It makes PG's code more compact:

data want;

do point = ceil(nobs/2), ceil((nobs+1)/2);

set class nobs=nobs point=point;

median + age/2;

end;

age = median;

output; stop;

keep age ;

format age 8.2;

run;

