Hi, all!
I was hoping you could help me out on this query. I am able to do this in Excel using a combination of median and if function (somehow mimicking the averageif built-in function) but would like to have it translated in SAS due to the big amount of data I am working on.
For illustration purposes, I have this sample dataset.
Month Day Volume
1 1 20
1 2 X
2 1 28
2 2 29
3 1 30
3 2 27
1 1 40
1 2 56
2 1 28
2 2 30
1 1 Y
1 2 30
In the dataset there are missing values represented by X and Y here. I need to fill out these missing values by getting the median of all the same month and same day as that of the missing field. For example, in order to get X, i get the median of all those with month = 1 and day = 2. Similarly for Y, i get the median of all those with month = 1 and day= 1.
Appreciate your help! Thank you!
Here's an example via code:
data have;
infile datalines truncover;
input Month Day Volume;
datalines;
1 1 20
1 2
2 1 28
2 2 29
3 1 30
3 2 27
1 1 40
1 2 56
2 1 28
2 2 30
1 1
1 2 30
;
run;
proc sort data=have;
by month day;
run;
proc stdize data=have out=want missing=median reponly;
by month day;
var volume;
run;
proc print data=want;
run;
Look at proc STDIZE with the MISSING option to replace the missing values.
The month and Hour are your BY Groups
Here's an example via code:
data have;
infile datalines truncover;
input Month Day Volume;
datalines;
1 1 20
1 2
2 1 28
2 2 29
3 1 30
3 2 27
1 1 40
1 2 56
2 1 28
2 2 30
1 1
1 2 30
;
run;
proc sort data=have;
by month day;
run;
proc stdize data=have out=want missing=median reponly;
by month day;
var volume;
run;
proc print data=want;
run;
Thank you, Reeza! This was very helpful.
by the way, what if instead of having missing values we have zero values instead? how do we get the median?
Then the solutions above wouldn't work. I would replace the 0 with missing and use the solution above.
What if we need to add some criteria, say we need to have this logic:
if Volume < 20 then Adjusted Volume = median of the same month and day
Month Day Volume Adjusted Volume
1 1 20
1 2 10
2 1 28
2 2 29
3 1 30
3 2 27
1 1 40
1 2 56
2 1 28
2 2 30
1 1 10
1 2 30
Then use the SQL code. You can add a case statement with conditional logic.
If if you don't have SAS 9.4 then you'll have to either use proc means to calculate median, merge it in and execute conditional logic, or conditionally set values to missing ( volume<20) and use proc stdize.
And one more option, BUT this only works if you're on SAS 9.4+. In previous version, SAS SQL does not calculate the median correctly.
proc sql;
create table want2 as
select *, median(volume) as med_volume, coalesce(volume, calculated med_volume) as Volume_Replaced
from have
group by month, day
order by month, day;
quit;
proc print data=want2;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.