BookmarkSubscribeRSS Feed
Adir92
Obsidian | Level 7

Hello everyone, 

I am currently looking for the median when we have a case like this:

Duration timeCase 
42
21
93
53

 

I want the median of Duration Time but here we have the nb of cases which specifies how many Duration Time there are for this value. That is, the original list is 2 4 4 5 5 5 9 9 9.
I have sorted and then used PROC MEANS median but the value does not seem correct because it will just take into account the field Duration time and work with the series 2 4 5 9.

PROC MEANS data = table median; 
var Duration Time; 
run;

How to take into account the value of the field Case ? Thank you for your help!

12 REPLIES 12
FreelanceReinh
Jade | Level 19

Hello @Adir92,

 

Use the right tool (Maxim 14):

data have;
input Duration Cases;
cards;
4 2
2 1
9 3
5 3
;

proc means data=have median;
freq cases;
var duration;
run;
Adir92
Obsidian | Level 7
Thank you but I have thousands of lines in my database, is it still usable ?
FreelanceReinh
Jade | Level 19

This should work as long as you can apply PROC MEANS to your "database."  The DATA step was just to create sample data for demonstration.

Adir92
Obsidian | Level 7
Okay so I just have to add freq in the code ? And it will take in account cases ?
FreelanceReinh
Jade | Level 19

@Adir92 wrote:
Okay so I just have to add freq in the code ? And it will take in account cases ?

If you are unsure as to whether the FREQ statement takes into account the number of cases, add the N statistic to the PROC MEANS statement:

proc means data=have n median;
freq cases;
var duration;
run;

In your example (with 4 observations) the result is N=9.

Adir92
Obsidian | Level 7
Okay thank you, I tried several times but its so weird I have different result. When I do without procedures and with proc means. For example, before computing the median I want to try with the mean and when I dot that I find 2.222 (in our current example) which is the real number but when I use proc means I find 5. I don't understand...
PaigeMiller
Diamond | Level 26

@Adir92 wrote:
Okay thank you, I tried several times but its so weird I have different result. When I do without procedures and with proc means. For example, before computing the median I want to try with the mean and when I dot that I find 2.222 (in our current example) which is the real number but when I use proc means I find 5. I don't understand...

Show us the exact code you used. I believe PROC MEANS (unless you made a coding error). I don't believe 2.222 unless you can show us how you obtained that number.

--
Paige Miller
Adir92
Obsidian | Level 7

Yes ! I answered just below.

Adir92
Obsidian | Level 7

I find different results when I do this : 

proc means data=table n mean; 
freq Case; 
var Duration_Time; 
run; 

And this : 

select sum(Duration_Time)/sum(Case)
from table
PaigeMiller
Diamond | Level 26

@Adir92 wrote:

I find different results when I do this : 

proc means data=table n mean; 
freq Case; 
var Duration_Time; 
run; 

And this : 

select sum(Duration_Time)/sum(Case)
from table

How did you transition from computing a median to the above, which is a weighted mean? Do you really want a weighted mean? Please explain.

 

Your SQL has the wrong formula for a weighted mean.

 

It should say

 

select sum(duration_time*case)/sum(case)

and even that will fail to give the right answer if duration_time is missing <-- which is a good reason (among many reasons) to NOT do this in SQL.

--
Paige Miller
LinusH
Tourmaline | Level 20

If you count your records in thousands, I don't think you'll have any performance issues.

What database do you use?

For Oracle as an example, MEDIAN is not supported by implicit pass through. If it's available in your target DBMS you can always use explicit SQL pass through.

Data never sleeps
Adir92
Obsidian | Level 7
Yes, I'm using SAS EG and proc sql. I think the only way is to use procedures.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1716 views
  • 2 likes
  • 4 in conversation