Hi, all!
I was hoping you could help me out on this query. I am able to do this in Excel using a combination of median and if function (somehow mimicking the averageif built-in function) but would like to have it translated in SAS due to the big amount of data I am working on.
For illustration purposes, I have this sample dataset.
Month Day Volume
1 1 20
1 2 X
2 1 28
2 2 29
3 1 30
3 2 27
1 1 40
1 2 56
2 1 28
2 2 30
1 1 Y
1 2 30
In the dataset there are missing values represented by X and Y here. I need to fill out these missing values by getting the median of all the same month and same day as that of the missing field. For example, in order to get X, i get the median of all those with month = 1 and day = 2. Similarly for Y, i get the median of all those with month = 1 and day= 1.
Appreciate your help! Thank you!
Here's an example via code:
data have;
infile datalines truncover;
input Month Day Volume;
datalines;
1 1 20
1 2
2 1 28
2 2 29
3 1 30
3 2 27
1 1 40
1 2 56
2 1 28
2 2 30
1 1
1 2 30
;
run;
proc sort data=have;
by month day;
run;
proc stdize data=have out=want missing=median reponly;
by month day;
var volume;
run;
proc print data=want;
run;
Look at proc STDIZE with the MISSING option to replace the missing values.
The month and Hour are your BY Groups
Here's an example via code:
data have;
infile datalines truncover;
input Month Day Volume;
datalines;
1 1 20
1 2
2 1 28
2 2 29
3 1 30
3 2 27
1 1 40
1 2 56
2 1 28
2 2 30
1 1
1 2 30
;
run;
proc sort data=have;
by month day;
run;
proc stdize data=have out=want missing=median reponly;
by month day;
var volume;
run;
proc print data=want;
run;
Thank you, Reeza! This was very helpful.
by the way, what if instead of having missing values we have zero values instead? how do we get the median?
Then the solutions above wouldn't work. I would replace the 0 with missing and use the solution above.
What if we need to add some criteria, say we need to have this logic:
if Volume < 20 then Adjusted Volume = median of the same month and day
Month Day Volume Adjusted Volume
1 1 20
1 2 10
2 1 28
2 2 29
3 1 30
3 2 27
1 1 40
1 2 56
2 1 28
2 2 30
1 1 10
1 2 30
Then use the SQL code. You can add a case statement with conditional logic.
If if you don't have SAS 9.4 then you'll have to either use proc means to calculate median, merge it in and execute conditional logic, or conditionally set values to missing ( volume<20) and use proc stdize.
And one more option, BUT this only works if you're on SAS 9.4+. In previous version, SAS SQL does not calculate the median correctly.
proc sql;
create table want2 as
select *, median(volume) as med_volume, coalesce(volume, calculated med_volume) as Volume_Replaced
from have
group by month, day
order by month, day;
quit;
proc print data=want2;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.