DATA Step, Macro, Functions and more

How to get median using multiple criteria

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

How to get median using multiple criteria

[ Edited ]

Hi, all!

 

I was hoping you could help me out on this query. I am able to do this in Excel using a combination of median and if function (somehow mimicking the averageif built-in function) but would like to have it translated in SAS due to the big amount of data I am working on.

 

For illustration purposes, I have this sample dataset. 

 

Month      Day      Volume

1               1           20

1               2           X

2               1           28

2               2           29

3               1           30

3               2           27

1               1           40

1               2           56

2               1           28

2               2           30

1               1           Y

1               2           30

 

In the dataset there are missing values represented by X and Y here. I need to fill out these missing values by getting the median of all the same month and same day as that of the missing field. For example, in order to get X, i get the median of all those with month = 1 and day = 2. Similarly for Y, i get the median of all those with month = 1 and day= 1.

 

Appreciate your help!  Thank you!


Accepted Solutions
Solution
‎08-02-2016 06:27 AM
Super User
Posts: 19,789

Re: How to get median using multiple criteria

Here's an example via code:

 


data have;
infile datalines truncover;
input Month      Day      Volume;
datalines;
1               1           20
1               2           
2               1           28
2               2           29
3               1           30
3               2           27
1               1           40
1               2           56
2               1           28
2               2           30
1               1           
1               2           30
;
run;
proc sort data=have;
by month day;
run;

proc stdize data=have out=want missing=median reponly;
by month day;
var volume;
run;

proc print data=want;
run;

View solution in original post


All Replies
Super User
Posts: 19,789

Re: How to get median using multiple criteria

Look at proc STDIZE with the MISSING option to replace the missing values. 

 

The month and Hour are your BY Groups

Solution
‎08-02-2016 06:27 AM
Super User
Posts: 19,789

Re: How to get median using multiple criteria

Here's an example via code:

 


data have;
infile datalines truncover;
input Month      Day      Volume;
datalines;
1               1           20
1               2           
2               1           28
2               2           29
3               1           30
3               2           27
1               1           40
1               2           56
2               1           28
2               2           30
1               1           
1               2           30
;
run;
proc sort data=have;
by month day;
run;

proc stdize data=have out=want missing=median reponly;
by month day;
var volume;
run;

proc print data=want;
run;
New Contributor
Posts: 4

Re: How to get median using multiple criteria

Thank you, Reeza! This was very helpful. Smiley Happy

New Contributor
Posts: 4

Re: How to get median using multiple criteria

by the way, what if instead of having missing values we have zero values instead? how do we get the median?

Super User
Posts: 19,789

Re: How to get median using multiple criteria

Then the solutions above wouldn't work. I would replace the 0 with missing and use the solution above. 

New Contributor
Posts: 4

Re: How to get median using multiple criteria

What if we need to add some criteria, say we need to have this logic:

 

if Volume < 20 then Adjusted Volume = median of the same month and day

 

Month      Day      Volume   Adjusted Volume

1               1           20

1               2          10

2               1           28

2               2           29

3               1           30

3               2           27

1               1           40

1               2           56

2               1           28

2               2           30

1               1           10

1               2           30

Super User
Posts: 19,789

Re: How to get median using multiple criteria

Then use the SQL code. You can add a case statement with conditional logic. 

 

If if you don't have SAS 9.4 then you'll have to either use proc means to calculate median, merge it in and execute conditional logic, or conditionally set values to missing ( volume<20) and use proc stdize. 

Super User
Posts: 19,789

Re: How to get median using multiple criteria

And one more option, BUT this only works if you're on SAS 9.4+. In previous version, SAS SQL does not calculate the median correctly.

 

proc sql;
create table want2 as
select *, median(volume) as med_volume, coalesce(volume, calculated med_volume) as Volume_Replaced
from have
group by month, day
order by month, day;
quit;

proc print data=want2;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 655 views
  • 1 like
  • 2 in conversation