Hello everyone,
I am currently looking for the median when we have a case like this:
Duration time | Case |
4 | 2 |
2 | 1 |
9 | 3 |
5 | 3 |
I want the median of Duration Time but here we have the nb of cases which specifies how many Duration Time there are for this value. That is, the original list is 2 4 4 5 5 5 9 9 9.
I have sorted and then used PROC MEANS median but the value does not seem correct because it will just take into account the field Duration time and work with the series 2 4 5 9.
PROC MEANS data = table median;
var Duration Time;
run;
How to take into account the value of the field Case ? Thank you for your help!
This should work as long as you can apply PROC MEANS to your "database." The DATA step was just to create sample data for demonstration.
@Adir92 wrote:
Okay so I just have to add freq in the code ? And it will take in account cases ?
If you are unsure as to whether the FREQ statement takes into account the number of cases, add the N statistic to the PROC MEANS statement:
proc means data=have n median;
freq cases;
var duration;
run;
In your example (with 4 observations) the result is N=9.
@Adir92 wrote:
Okay thank you, I tried several times but its so weird I have different result. When I do without procedures and with proc means. For example, before computing the median I want to try with the mean and when I dot that I find 2.222 (in our current example) which is the real number but when I use proc means I find 5. I don't understand...
Show us the exact code you used. I believe PROC MEANS (unless you made a coding error). I don't believe 2.222 unless you can show us how you obtained that number.
Yes ! I answered just below.
I find different results when I do this :
proc means data=table n mean;
freq Case;
var Duration_Time;
run;
And this :
select sum(Duration_Time)/sum(Case)
from table
@Adir92 wrote:
I find different results when I do this :
proc means data=table n mean; freq Case; var Duration_Time; run;
And this :
select sum(Duration_Time)/sum(Case) from table
How did you transition from computing a median to the above, which is a weighted mean? Do you really want a weighted mean? Please explain.
Your SQL has the wrong formula for a weighted mean.
It should say
select sum(duration_time*case)/sum(case)
and even that will fail to give the right answer if duration_time is missing <-- which is a good reason (among many reasons) to NOT do this in SQL.
If you count your records in thousands, I don't think you'll have any performance issues.
What database do you use?
For Oracle as an example, MEDIAN is not supported by implicit pass through. If it's available in your target DBMS you can always use explicit SQL pass through.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.