BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Savyasachi007
Fluorite | Level 6

Hi All , I want to replace Missing values with average. As an example my data set contains variable salary Gender and age_group, i want to replace missing values of Salary with avg for That Gender(which is in Observation)and  falls in that particular Age Group.

I want to replace the Missing Value @ Id 8 with Avg Salary of female in Medium Age Group.

 

IdAge_groupGenderSalary
1YoungM44701
2Medium F41755
3OldM37127
4YoungF85233
5Medium M67517
6OldF99904
7YoungM26938
8Medium F 
9OldM37583
10YoungF78853
11Medium M 
12OldF73138
13YoungM29987
14Medium F33115
15OldM50883
16YoungF 
17Medium M48555
18OldF62939
19YoungM52488
20Medium F65333
21OldM87914
1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

do you mean this?

 

data have;
infile datalines truncover;
input Id	Age_group : $15.	Gender $	Salary;
cards;
1	Young	M	44701
2	Medium 	F	41755
3	Old	M	37127
4	Young	F	85233
5	Medium 	M	67517
6	Old	F	99904
7	Young	M	26938
8	Medium 	F	 
9	Old	M	37583
10	Young	F	78853
11	Medium 	M	 
12	Old	F	73138
13	Young	M	29987
14	Medium 	F	33115
15	Old	M	50883
16	Young	F	 
17	Medium 	M	48555
18	Old	F	62939
19	Young	M	52488
20	Medium 	F	65333
21	Old	M	87914
;

proc sql;
create table want as
select *,sum(salary,mean(salary)*(salary=.)) as s
from have
group by gender,Age_group
order by id;
quit;

View solution in original post

8 REPLIES 8
novinosrin
Tourmaline | Level 20

do you mean this?

 

data have;
infile datalines truncover;
input Id	Age_group : $15.	Gender $	Salary;
cards;
1	Young	M	44701
2	Medium 	F	41755
3	Old	M	37127
4	Young	F	85233
5	Medium 	M	67517
6	Old	F	99904
7	Young	M	26938
8	Medium 	F	 
9	Old	M	37583
10	Young	F	78853
11	Medium 	M	 
12	Old	F	73138
13	Young	M	29987
14	Medium 	F	33115
15	Old	M	50883
16	Young	F	 
17	Medium 	M	48555
18	Old	F	62939
19	Young	M	52488
20	Medium 	F	65333
21	Old	M	87914
;

proc sql;
create table want as
select *,sum(salary,mean(salary)*(salary=.)) as s
from have
group by gender,Age_group
order by id;
quit;
novinosrin
Tourmaline | Level 20

The previous response shows you the missing and the average value to be imputed for those missings in the next column

The below one replaces the original salary column

 

data have;
infile datalines truncover;
input Id	Age_group : $15.	Gender $	Salary;
cards;
1	Young	M	44701
2	Medium 	F	41755
3	Old	M	37127
4	Young	F	85233
5	Medium 	M	67517
6	Old	F	99904
7	Young	M	26938
8	Medium 	F	 
9	Old	M	37583
10	Young	F	78853
11	Medium 	M	 
12	Old	F	73138
13	Young	M	29987
14	Medium 	F	33115
15	Old	M	50883
16	Young	F	 
17	Medium 	M	48555
18	Old	F	62939
19	Young	M	52488
20	Medium 	F	65333
21	Old	M	87914
;
proc sql;
create table want as
select id,Age_group,Gender,sum(salary,mean(salary)*(salary=.)) as salary
from have
group by gender,Age_group
order by id;
quit;
Savyasachi007
Fluorite | Level 6
It worked , thanks but unable to understand multiplication of avg value with missing , please explain.
novinosrin
Tourmaline | Level 20

@Savyasachi007 I am about to step out for lunch with my mates. Give me 30 mins please, I will leave some notes. 

Savyasachi007
Fluorite | Level 6
Sure ....
novinosrin
Tourmaline | Level 20

@Savyasachi007  Sorry took a bit too long as I was enjoying every bite lol

 

Here are the notes:

1.  There are three processes that is involved in this execution a. group by as mentioned by you  b. finding the mean  c. finding the missing  and imputing the mean 

2. Salary=. is binary expression that results in 1 or 0 or in other words True or false. If true which is when missing in our case, results in 1 and when not missing results in 0

3. Once mean is calculated, i multiply the mean with the result of binary expression to make sure I have 0's for not missing to be summed(added) with existing salary and computed mean salary value *1=computed mean salary value for missing salary

4. so this will result in final expression of 

0+salaryvalue(for non missing cases)

computedmeansalaryvalue+. (for missing cases)

5. The sum function is applied  instead of operator to ignore the missing values and the final result will have your imputed salary safely for your missing cases and original salary for non missing cases respectively

 

I hope that helps

 

Best Regards and have a nice day. 

Savyasachi007
Fluorite | Level 6

Thanks A ton, 

Have a great day.

Rick_SAS
SAS Super FREQ

Replacing missing values by the mean is called "mean imputation," You can read about how to use PROC STDIZE to perform mean imputation---and why mean imputation is generally a bad idea, statistically speaking.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1360 views
  • 2 likes
  • 3 in conversation