Imputing missing numeric values is well discussed and solutions are well defined.
Is there a way we can replace a missing value of character variable with the values of observation with largest frequency?
See a rather brute force example:
/* create example data */
data class;
set sashelp.class end=done;
output;
if done
then do;
call missing(of _all_);
output;
end;
run;
/* update with largest freq */
proc sql;
create table upd (drop=count) as
select sex, count(*) as count from class
where sex ne ' '
group by sex
order by count descending;
create table want as
select name, coalesce(a.sex,b.sex) as sex, age, height, weight
from class a, upd (obs=1) b;
drop table upd;
quit;
See a rather brute force example:
/* create example data */
data class;
set sashelp.class end=done;
output;
if done
then do;
call missing(of _all_);
output;
end;
run;
/* update with largest freq */
proc sql;
create table upd (drop=count) as
select sex, count(*) as count from class
where sex ne ' '
group by sex
order by count descending;
create table want as
select name, coalesce(a.sex,b.sex) as sex, age, height, weight
from class a, upd (obs=1) b;
drop table upd;
quit;
What do you mean by values of observation with largest frequency, Can you be more specific.
If you want all the missing characters to be replaced by some other values then you can use arrays as following:
data want;
set sashelp.class;
array char _Character_;
do over char;
if missing(char) then char="Some value";
end;
run;
Here's a solution. I would imagine that there are plenty of better solutions. For example if there is an equal amount of the most common it will just pick one of them.
DATA have;
INPUT ID Imp $;
*---+----1----+----2----+----3----+----4----+----5;
CARDS;
1 A
2 A
3 .
4 D
5 D
6 D
;
Proc sql outobs=1;
create table mostcommon as
select t1.imp, count(t1.imp) AS Count
from work.have as t1
where t1.imp is not missing
group by t1.imp
order by CALCULATED count desc
;
quit;
Proc sql;
create table want as
select have.ID, COALESCEC(have.Imp,mc.Imp) as Imp
from work.have as have
left join work.mostcommon as mc on 1=1
;
quit;
DATA have;
infile datalines missover;
INPUT ID Imp $;
CARDS;
1 A
2 A
3
4 D
5 D
6 D
;
proc sql noprint;
select Imp INTO:Rep
from (select Imp,COUNT(Imp) as count
from have
Group by Imp)
Having count=max(Count)
;
quit;
data want;
set Have;
array char _Character_;
do over char;
if missing(char) then char="&Rep";
end;
run;
NOTE: Character missing values are represented by blank, not period.
Thank you.
I didn't know about
infile datalines missover;
First, you have to find the most common values, e.g.:
proc freq data=sashelp.class noprint order=freq;
tables age /out=age;
tables height/ out=height;
tables sex/ out=sex;
run;
Then, the easiest may be to put the values in macro variables:
data _null_;
merge age sex height; /* with ORDER=FREQ, the first obs is the most common value */
call symputx('age',age);
call symputx('height',height);
call symputx('sex',sex);
stop;
run;
Given this test data:
data test;
set sashelp.class;
if _N_>8 then
call missing(height,age);
if name =:'L' then
sex=' ';
run;
We can now fill in the blanks:
data want;
set test;
age=coalesce(age,&age);
height=coalesce(height,&height);
sex=coalescec(sex,"&sex");
run;
Thank you all for the response.
All solutions were good . But as I understand it I can accept one as the solution.
So selecting the first one.
Thanks again
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.