Help using Base SAS procedures

Finding the median of a variable in a data step

Reply
Occasional Contributor
Posts: 8

Finding the median of a variable in a data step

I have an undefined number of observations sorted by a var1 and I'm looking to find the median of var1 of these observations in a data step (without using proc means). I know I can add a variable for _n_ which will give me the observation number. But then, I'm wondering how I can get the middle observation (if the number of observations is odd) or average the two middle observations (if the number of observations is even).

Any help on this would be appreciated.

PROC Star
Posts: 7,490

Re: Finding the median of a variable in a data step

Posted in reply to eggman2001

Why would you want to do this since a number of procs already have the ability to do all of the work?

Anyhow, since you asked, how about (using your previous example as the data)?:

data have;

  input var1-var10 col1-col11;

  cards;

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

;

data want (keep=variable);

  set have;

  array vars _all_;

  do over vars;

    variable=vars;

    output;

  end;

run;

data median (keep=median);

  set want end=lastrec nobs=numobs;

  retain low high;

  if numobs/2-int(numobs/2) and _n_ eq ceil(numobs/2) then do;

    low=variable;

    high=variable;

  end;

  else if _n_ eq int(numobs/2) then low=variable;

  else if _n_ eq int(numobs/2)+1 then high=variable;

  if lastrec then do;

    median=sum(low,high)/2;

    output;

  end;

run;

Respected Advisor
Posts: 4,931

Re: Finding the median of a variable in a data step

Posted in reply to eggman2001

Another way :

data have;
input var1 @@;
datalines;
4 7 2 6 32 4 5 8 3 7
;

data _null_;
set have nobs=n;
call symput ("firstobs", ceil(n/2));
call symput ("obs", ceil(n/2)+(mod(n,2)=0));
stop;
run;

proc sort data=have; by var1; run;

data want(keep=median);
set have (firstobs=&firstobs. obs=&firstobs.);
varx = var1;
set have (firstobs=&obs. obs=&obs.);
median = mean(var1,varx);
run;

PG

PG
Super User
Posts: 5,516

Re: Finding the median of a variable in a data step

PGStats,

I know you know this, but it looks like you're burning a little too much midnight oil.  Switch the CALL SYMPUTs to:

call symputx("firstobs", floor(n/2));

call symputx("obs", ceil(n/2));

The tools are good, the details are tricky.  Also, can the formula for medians get complex if there can be ties?

PROC Star
Posts: 7,490

Re: Finding the median of a variable in a data step

Posted in reply to Astounding

: Ties are irrelevant.  The definition can be found at: Median - Wikipedia, the free encyclopedia

Super User
Posts: 5,516

Re: Finding the median of a variable in a data step

Art,

I'm focusing on the word "usually" at the end of the first paragraph of your link.

Also, on the PCTLDEF option within PROC UNIVARIATE.

You might be right on this, but I'm just not sure yet.

PROC Star
Posts: 7,490

Re: Finding the median of a variable in a data step

Posted in reply to Astounding

: If one uses one of the sas procs or function for getting the median that definition is the one that is used.  See, e.g.,

Base SAS(R) 9.2 Procedures Guide

Respected Advisor
Posts: 4,931

Re: Finding the median of a variable in a data step

Posted in reply to Astounding

If N=5, I want firstobs=3 and obs=3. If N=4, I want firstobs=2 and obs=3. Hence the expressions I proposed.

Ties are not a problem. Empty datasets, are, however.

PG

PG
Super User
Posts: 5,516

Re: Finding the median of a variable in a data step

PGStats,

You're right, my bad.  Where's that coffee?

Respected Advisor
Posts: 3,799

Re: Finding the median of a variable in a data step

If the data are sorted you can use direct access to find the one or two obs needed for median.

proc sort data=sashelp.class(obs=16) out=class;
   by age;
   run;
data want;
   x = nobs/2;
  
if mod(nobs,2) eq 0 then do;
     
do point = x,x+1;
        
link set;
         s + age;
        
end;
      median = s/
2;
     
end;
  
else do point=ceil(x);
      link set;
      median = age;
     
end;
  
output;
  
stop;
  
return;
set:
set class nobs=nobs point=point;
   return;
  
keep age ;
   format age 8.2;
  
run;
Respected Advisor
Posts: 4,931

Re: Finding the median of a variable in a data step

Posted in reply to data_null__

Nicely done DN. It can be further simplified as :

proc sort data=sashelp.class(obs=16) out=class;

   by age;

   run;

data want;

   if mod(nobs, 2) then do point=(1+nobs)/2;

      set class point=point;

      median = age;

      end;

else do point = nobs/2, 1+nobs/2;

      set class nobs=nobs point=point;

      median + age/2;

      end;

   age = median;

   output; stop;

   keep age ;

   format age 8.2;

   run;

PG

PG
Respected Advisor
Posts: 4,931

Re: Finding the median of a variable in a data step

Posted in reply to data_null__

Or if you like compactness :

proc sort data=sashelp.class(obs=16) out=class;

   by age;

   run;

data want;

   do point=(mod(nobs, 2)+nobs)/2, (2-mod(nobs, 2)+nobs)/2 ;

      set class nobs=nobs point=point;

      median + age/2;

      end;

   age = median;

   output; stop;

   keep age ;

   format age 8.2;

   run;

PG

PG
Super Contributor
Posts: 1,636

Re: Finding the median of a variable in a data step

Posted in reply to eggman2001

I am so smart that I have deleted my stupid post this morning:smileylaugh:.

Super User
Posts: 5,516

Re: Finding the median of a variable in a data step

Maybe I should have done the same.  :smileyshocked:

To atone for my earlier fog, here's my less foggy version of the looping on this one:

do point = ceil(nobs/2), ceil( (nobs+1)/2 );

Super Contributor
Posts: 1,636

Re: Finding the median of a variable in a data step

Posted in reply to Astounding

NIce! It makes PG's code more compact:

data want;

   do point = ceil(nobs/2), ceil((nobs+1)/2);

      set class nobs=nobs point=point;

      median + age/2;

      end;

   age = median;

   output; stop;

   keep age ;

   format age 8.2;

   run;

Ask a Question
Discussion stats
  • 16 replies
  • 5399 views
  • 4 likes
  • 6 in conversation