06-25-2013 06:29 AM
look at this basic dataset
I'm using the proc univariate in order to calculate the median value of Var2, which is not so difficult so far. But I should save in the output dataset the value of Var1 too (in this case "D"). Is there an option in the proc univariate to do that?
06-25-2013 08:37 AM
But the median can sometimes be between 2 values (e.g. 1,2,3,4 = 2.5), or there could be multiple rows that have the median value (e.g. 1,2,2,2,3 = 2). How would you want to treat those?
06-25-2013 09:11 AM
I was just thinking the same. In any case, for my purposes, multiple values do not affect the goodness of the results. I could choose one of them.
However I think that proc univariate does not have this kind of option. Does it?
Should that be the case, I imagine that I'd have to remerge my results with the original dataset. Am I right?
06-25-2013 09:38 AM
I don't believe this option does exist, so your suggestion is one way to go, although you'll have to deal with the situations I described. Another method would be to sort the data by Var2, then loop through until Var2 >= median and output that observation.
06-26-2013 01:35 PM
That brings to mind --what about using PROC RANK? Then just select the record with rank=floor(N/2) + 1, where N is the number of observations if odd, and select both records floor(N/2) and floor(N/2) + 1 if N is even, and take the mean of those two values. I'm sure there is a fairly straightforward way to program this in a data step after ranking the values.
06-27-2013 08:24 AM
Generalizing Steve's suggestion, why not just sort and then print out the floor(N/2)+1 observation, like this:
if 0 then set sashelp.class nobs=n;
proc sort data=sashelp.class out=class;
proc print data=class(firstobs=&MedIndex obs=&MedIndex);