- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone,
I am currently looking for the median when we have a case like this:
Duration time | Case |
4 | 2 |
2 | 1 |
9 | 3 |
5 | 3 |
I want the median of Duration Time but here we have the nb of cases which specifies how many Duration Time there are for this value. That is, the original list is 2 4 4 5 5 5 9 9 9.
I have sorted and then used PROC MEANS median but the value does not seem correct because it will just take into account the field Duration time and work with the series 2 4 5 9.
PROC MEANS data = table median;
var Duration Time;
run;
How to take into account the value of the field Case ? Thank you for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This should work as long as you can apply PROC MEANS to your "database." The DATA step was just to create sample data for demonstration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Adir92 wrote:
Okay so I just have to add freq in the code ? And it will take in account cases ?
If you are unsure as to whether the FREQ statement takes into account the number of cases, add the N statistic to the PROC MEANS statement:
proc means data=have n median;
freq cases;
var duration;
run;
In your example (with 4 observations) the result is N=9.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Adir92 wrote:
Okay thank you, I tried several times but its so weird I have different result. When I do without procedures and with proc means. For example, before computing the median I want to try with the mean and when I dot that I find 2.222 (in our current example) which is the real number but when I use proc means I find 5. I don't understand...
Show us the exact code you used. I believe PROC MEANS (unless you made a coding error). I don't believe 2.222 unless you can show us how you obtained that number.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes ! I answered just below.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I find different results when I do this :
proc means data=table n mean;
freq Case;
var Duration_Time;
run;
And this :
select sum(Duration_Time)/sum(Case)
from table
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Adir92 wrote:
I find different results when I do this :
proc means data=table n mean; freq Case; var Duration_Time; run;
And this :
select sum(Duration_Time)/sum(Case) from table
How did you transition from computing a median to the above, which is a weighted mean? Do you really want a weighted mean? Please explain.
Your SQL has the wrong formula for a weighted mean.
It should say
select sum(duration_time*case)/sum(case)
and even that will fail to give the right answer if duration_time is missing <-- which is a good reason (among many reasons) to NOT do this in SQL.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you count your records in thousands, I don't think you'll have any performance issues.
What database do you use?
For Oracle as an example, MEDIAN is not supported by implicit pass through. If it's available in your target DBMS you can always use explicit SQL pass through.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content