BookmarkSubscribeRSS Feed
Enomis
Calcite | Level 5

Hello,

look at this basic dataset

Var1 Var2

A 2

D 3

C 4

I'm using the proc univariate in order to calculate the median value of Var2, which is not so difficult so far. But I should save in the output dataset the value of Var1 too (in this case "D"). Is there an option in the proc univariate to do that?

Thank you

7 REPLIES 7
Keith
Obsidian | Level 7

But the median can sometimes be between 2 values (e.g. 1,2,3,4 = 2.5), or there could be multiple rows that have the median value (e.g. 1,2,2,2,3 = 2).  How would you want to treat those?

Enomis
Calcite | Level 5

Hi Keith,

I was just thinking the same. In any case, for my purposes, multiple values do not affect the goodness of the results. I could choose one of them.

However I think that proc univariate does not have this kind of option. Does it?

Should that be the case, I imagine that I'd have to remerge my results with the original dataset. Am I right?

Thanks again

Keith
Obsidian | Level 7

I don't believe this option does exist, so your suggestion is one way to go, although you'll have to deal with the situations I described.  Another method would be to sort the data by Var2, then loop through until Var2 >= median and output that observation.

Enomis
Calcite | Level 5

Ok, I think i''ll go with the merge solution. I was looking for an option, but it's clear that I have to do some additional work...

Thanks for your usefull advices

Reeza
Super User

The median may not be a value in your dataset, or it may be multiple values. Something to consider.

SteveDenham
Jade | Level 19

That brings to mind --what about using PROC RANK?  Then just select the record with rank=floor(N/2) + 1, where N is the number of observations if odd, and select both records floor(N/2) and floor(N/2) + 1 if N is even, and take the mean of those two values.  I'm sure there is a fairly straightforward way to program this in a data step after ranking the values.

Steve Denham

Rick_SAS
SAS Super FREQ

Generalizing Steve's suggestion, why not just sort and then print out the floor(N/2)+1 observation, like this:

data _NULL_;

   if 0 then set sashelp.class nobs=n;

   call symputx('MedIndex',floor(n/2)+1);

   stop;

run;

proc sort data=sashelp.class out=class;

   by age;

run;

proc print data=class(firstobs=&MedIndex obs=&MedIndex);

run;

sas-innovate-white.png

🚨 Early Bird Rate Extended!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Lock in the best rate now before the price increases on April 1.

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1914 views
  • 0 likes
  • 5 in conversation