BookmarkSubscribeRSS Feed
Enomis
Calcite | Level 5

Hello,

look at this basic dataset

Var1 Var2

A 2

D 3

C 4

I'm using the proc univariate in order to calculate the median value of Var2, which is not so difficult so far. But I should save in the output dataset the value of Var1 too (in this case "D"). Is there an option in the proc univariate to do that?

Thank you

7 REPLIES 7
Keith
Obsidian | Level 7

But the median can sometimes be between 2 values (e.g. 1,2,3,4 = 2.5), or there could be multiple rows that have the median value (e.g. 1,2,2,2,3 = 2).  How would you want to treat those?

Enomis
Calcite | Level 5

Hi Keith,

I was just thinking the same. In any case, for my purposes, multiple values do not affect the goodness of the results. I could choose one of them.

However I think that proc univariate does not have this kind of option. Does it?

Should that be the case, I imagine that I'd have to remerge my results with the original dataset. Am I right?

Thanks again

Keith
Obsidian | Level 7

I don't believe this option does exist, so your suggestion is one way to go, although you'll have to deal with the situations I described.  Another method would be to sort the data by Var2, then loop through until Var2 >= median and output that observation.

Enomis
Calcite | Level 5

Ok, I think i''ll go with the merge solution. I was looking for an option, but it's clear that I have to do some additional work...

Thanks for your usefull advices

Reeza
Super User

The median may not be a value in your dataset, or it may be multiple values. Something to consider.

SteveDenham
Jade | Level 19

That brings to mind --what about using PROC RANK?  Then just select the record with rank=floor(N/2) + 1, where N is the number of observations if odd, and select both records floor(N/2) and floor(N/2) + 1 if N is even, and take the mean of those two values.  I'm sure there is a fairly straightforward way to program this in a data step after ranking the values.

Steve Denham

Rick_SAS
SAS Super FREQ

Generalizing Steve's suggestion, why not just sort and then print out the floor(N/2)+1 observation, like this:

data _NULL_;

   if 0 then set sashelp.class nobs=n;

   call symputx('MedIndex',floor(n/2)+1);

   stop;

run;

proc sort data=sashelp.class out=class;

   by age;

run;

proc print data=class(firstobs=&MedIndex obs=&MedIndex);

run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1296 views
  • 0 likes
  • 5 in conversation