BookmarkSubscribeRSS Feed
eggman2001
Calcite | Level 5

I have an undefined number of observations sorted by a var1 and I'm looking to find the median of var1 of these observations in a data step (without using proc means). I know I can add a variable for _n_ which will give me the observation number. But then, I'm wondering how I can get the middle observation (if the number of observations is odd) or average the two middle observations (if the number of observations is even).

Any help on this would be appreciated.

16 REPLIES 16
art297
Opal | Level 21

Why would you want to do this since a number of procs already have the ability to do all of the work?

Anyhow, since you asked, how about (using your previous example as the data)?:

data have;

  input var1-var10 col1-col11;

  cards;

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

;

data want (keep=variable);

  set have;

  array vars _all_;

  do over vars;

    variable=vars;

    output;

  end;

run;

data median (keep=median);

  set want end=lastrec nobs=numobs;

  retain low high;

  if numobs/2-int(numobs/2) and _n_ eq ceil(numobs/2) then do;

    low=variable;

    high=variable;

  end;

  else if _n_ eq int(numobs/2) then low=variable;

  else if _n_ eq int(numobs/2)+1 then high=variable;

  if lastrec then do;

    median=sum(low,high)/2;

    output;

  end;

run;

PGStats
Opal | Level 21

Another way :

data have;
input var1 @@;
datalines;
4 7 2 6 32 4 5 8 3 7
;

data _null_;
set have nobs=n;
call symput ("firstobs", ceil(n/2));
call symput ("obs", ceil(n/2)+(mod(n,2)=0));
stop;
run;

proc sort data=have; by var1; run;

data want(keep=median);
set have (firstobs=&firstobs. obs=&firstobs.);
varx = var1;
set have (firstobs=&obs. obs=&obs.);
median = mean(var1,varx);
run;

PG

PG
Astounding
PROC Star

PGStats,

I know you know this, but it looks like you're burning a little too much midnight oil.  Switch the CALL SYMPUTs to:

call symputx("firstobs", floor(n/2));

call symputx("obs", ceil(n/2));

The tools are good, the details are tricky.  Also, can the formula for medians get complex if there can be ties?

art297
Opal | Level 21

: Ties are irrelevant.  The definition can be found at: Median - Wikipedia, the free encyclopedia

Astounding
PROC Star

Art,

I'm focusing on the word "usually" at the end of the first paragraph of your link.

Also, on the PCTLDEF option within PROC UNIVARIATE.

You might be right on this, but I'm just not sure yet.

art297
Opal | Level 21

: If one uses one of the sas procs or function for getting the median that definition is the one that is used.  See, e.g.,

Base SAS(R) 9.2 Procedures Guide

PGStats
Opal | Level 21

If N=5, I want firstobs=3 and obs=3. If N=4, I want firstobs=2 and obs=3. Hence the expressions I proposed.

Ties are not a problem. Empty datasets, are, however.

PG

PG
Astounding
PROC Star

PGStats,

You're right, my bad.  Where's that coffee?

data_null__
Jade | Level 19

If the data are sorted you can use direct access to find the one or two obs needed for median.

proc sort data=sashelp.class(obs=16) out=class;
   by age;
   run;
data want;
   x = nobs/2;
  
if mod(nobs,2) eq 0 then do;
     
do point = x,x+1;
        
link set;
         s + age;
        
end;
      median = s/
2;
     
end;
  
else do point=ceil(x);
      link set;
      median = age;
     
end;
  
output;
  
stop;
  
return;
set:
set class nobs=nobs point=point;
   return;
  
keep age ;
   format age 8.2;
  
run;
PGStats
Opal | Level 21

Nicely done DN. It can be further simplified as :

proc sort data=sashelp.class(obs=16) out=class;

   by age;

   run;

data want;

   if mod(nobs, 2) then do point=(1+nobs)/2;

      set class point=point;

      median = age;

      end;

else do point = nobs/2, 1+nobs/2;

      set class nobs=nobs point=point;

      median + age/2;

      end;

   age = median;

   output; stop;

   keep age ;

   format age 8.2;

   run;

PG

PG
PGStats
Opal | Level 21

Or if you like compactness :

proc sort data=sashelp.class(obs=16) out=class;

   by age;

   run;

data want;

   do point=(mod(nobs, 2)+nobs)/2, (2-mod(nobs, 2)+nobs)/2 ;

      set class nobs=nobs point=point;

      median + age/2;

      end;

   age = median;

   output; stop;

   keep age ;

   format age 8.2;

   run;

PG

PG
Linlin
Lapis Lazuli | Level 10

I am so smart that I have deleted my stupid post this morning:smileylaugh:.

Astounding
PROC Star

Maybe I should have done the same.  :smileyshocked:

To atone for my earlier fog, here's my less foggy version of the looping on this one:

do point = ceil(nobs/2), ceil( (nobs+1)/2 );

Linlin
Lapis Lazuli | Level 10

NIce! It makes PG's code more compact:

data want;

   do point = ceil(nobs/2), ceil((nobs+1)/2);

      set class nobs=nobs point=point;

      median + age/2;

      end;

   age = median;

   output; stop;

   keep age ;

   format age 8.2;

   run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 10361 views
  • 4 likes
  • 6 in conversation