DATA Step, Macro, Functions and more

replacing Missing Values

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

replacing Missing Values

Hi All , I want to replace Missing values with average. As an example my data set contains variable salary Gender and age_group, i want to replace missing values of Salary with avg for That Gender(which is in Observation)and  falls in that particular Age Group.

I want to replace the Missing Value @ Id 8 with Avg Salary of female in Medium Age Group.

 

IdAge_groupGenderSalary
1YoungM44701
2Medium F41755
3OldM37127
4YoungF85233
5Medium M67517
6OldF99904
7YoungM26938
8Medium F 
9OldM37583
10YoungF78853
11Medium M 
12OldF73138
13YoungM29987
14Medium F33115
15OldM50883
16YoungF 
17Medium M48555
18OldF62939
19YoungM52488
20Medium F65333
21OldM87914

Accepted Solutions
Solution
a week ago
PROC Star
Posts: 1,558

Re: replacing Missing Values

[ Edited ]
Posted in reply to Savyasachi007

do you mean this?

 

data have;
infile datalines truncover;
input Id	Age_group : $15.	Gender $	Salary;
cards;
1	Young	M	44701
2	Medium 	F	41755
3	Old	M	37127
4	Young	F	85233
5	Medium 	M	67517
6	Old	F	99904
7	Young	M	26938
8	Medium 	F	 
9	Old	M	37583
10	Young	F	78853
11	Medium 	M	 
12	Old	F	73138
13	Young	M	29987
14	Medium 	F	33115
15	Old	M	50883
16	Young	F	 
17	Medium 	M	48555
18	Old	F	62939
19	Young	M	52488
20	Medium 	F	65333
21	Old	M	87914
;

proc sql;
create table want as
select *,sum(salary,mean(salary)*(salary=.)) as s
from have
group by gender,Age_group
order by id;
quit;

View solution in original post


All Replies
Solution
a week ago
PROC Star
Posts: 1,558

Re: replacing Missing Values

[ Edited ]
Posted in reply to Savyasachi007

do you mean this?

 

data have;
infile datalines truncover;
input Id	Age_group : $15.	Gender $	Salary;
cards;
1	Young	M	44701
2	Medium 	F	41755
3	Old	M	37127
4	Young	F	85233
5	Medium 	M	67517
6	Old	F	99904
7	Young	M	26938
8	Medium 	F	 
9	Old	M	37583
10	Young	F	78853
11	Medium 	M	 
12	Old	F	73138
13	Young	M	29987
14	Medium 	F	33115
15	Old	M	50883
16	Young	F	 
17	Medium 	M	48555
18	Old	F	62939
19	Young	M	52488
20	Medium 	F	65333
21	Old	M	87914
;

proc sql;
create table want as
select *,sum(salary,mean(salary)*(salary=.)) as s
from have
group by gender,Age_group
order by id;
quit;
PROC Star
Posts: 1,558

Re: replacing Missing Values

Posted in reply to novinosrin

The previous response shows you the missing and the average value to be imputed for those missings in the next column

The below one replaces the original salary column

 

data have;
infile datalines truncover;
input Id	Age_group : $15.	Gender $	Salary;
cards;
1	Young	M	44701
2	Medium 	F	41755
3	Old	M	37127
4	Young	F	85233
5	Medium 	M	67517
6	Old	F	99904
7	Young	M	26938
8	Medium 	F	 
9	Old	M	37583
10	Young	F	78853
11	Medium 	M	 
12	Old	F	73138
13	Young	M	29987
14	Medium 	F	33115
15	Old	M	50883
16	Young	F	 
17	Medium 	M	48555
18	Old	F	62939
19	Young	M	52488
20	Medium 	F	65333
21	Old	M	87914
;
proc sql;
create table want as
select id,Age_group,Gender,sum(salary,mean(salary)*(salary=.)) as salary
from have
group by gender,Age_group
order by id;
quit;
New Contributor
Posts: 4

Re: replacing Missing Values

Posted in reply to Savyasachi007
It worked , thanks but unable to understand multiplication of avg value with missing , please explain.
PROC Star
Posts: 1,558

Re: replacing Missing Values

Posted in reply to Savyasachi007

@Savyasachi007 I am about to step out for lunch with my mates. Give me 30 mins please, I will leave some notes. 

New Contributor
Posts: 4

Re: replacing Missing Values

Posted in reply to novinosrin
Sure ....
PROC Star
Posts: 1,558

Re: replacing Missing Values

Posted in reply to Savyasachi007

@Savyasachi007  Sorry took a bit too long as I was enjoying every bite lol

 

Here are the notes:

1.  There are three processes that is involved in this execution a. group by as mentioned by you  b. finding the mean  c. finding the missing  and imputing the mean 

2. Salary=. is binary expression that results in 1 or 0 or in other words True or false. If true which is when missing in our case, results in 1 and when not missing results in 0

3. Once mean is calculated, i multiply the mean with the result of binary expression to make sure I have 0's for not missing to be summed(added) with existing salary and computed mean salary value *1=computed mean salary value for missing salary

4. so this will result in final expression of 

0+salaryvalue(for non missing cases)

computedmeansalaryvalue+. (for missing cases)

5. The sum function is applied  instead of operator to ignore the missing values and the final result will have your imputed salary safely for your missing cases and original salary for non missing cases respectively

 

I hope that helps

 

Best Regards and have a nice day. 

New Contributor
Posts: 4

Re: replacing Missing Values

Posted in reply to novinosrin

Thanks A ton, 

Have a great day.

SAS Super FREQ
Posts: 4,171

Re: replacing Missing Values

Posted in reply to Savyasachi007

Replacing missing values by the mean is called "mean imputation," You can read about how to use PROC STDIZE to perform mean imputation---and why mean imputation is generally a bad idea, statistically speaking.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 223 views
  • 2 likes
  • 3 in conversation