<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Imputing missing values of character variables in a large data set in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465632#M118781</link>
    <description>&lt;P&gt;First, you have to find the most common values, e.g.:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;proc freq data=sashelp.class noprint order=freq;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; tables age /out=age;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; tables height/ out=height;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; tables sex/ out=sex;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;Then, the easiest may be to put the values in macro variables:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;data _null_;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; merge age sex height; /* with ORDER=FREQ, the first obs is the most common value */&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; call symputx('age',age);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; call symputx('height',height);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; call symputx('sex',sex);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; stop;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;Given this test data:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;data test;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; set sashelp.class;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; if _N_&amp;gt;8 then&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; &amp;nbsp; call missing(height,age);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; if name =:'L' then&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; &amp;nbsp; sex=' ';&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;We can now fill in the blanks:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;data want;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; set test;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; age=coalesce(age,&amp;amp;age);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; height=coalesce(height,&amp;amp;height);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; sex=coalescec(sex,"&amp;amp;sex");&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 29 May 2018 08:46:52 GMT</pubDate>
    <dc:creator>s_lassen</dc:creator>
    <dc:date>2018-05-29T08:46:52Z</dc:date>
    <item>
      <title>Imputing missing values of character variables in a large data set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465481#M118734</link>
      <description>&lt;P&gt;Imputing missing numeric values is well discussed and solutions are well defined.&lt;BR /&gt;Is there a way we can replace a missing value of character variable with the values of observation with largest frequency?&lt;/P&gt;</description>
      <pubDate>Mon, 28 May 2018 14:05:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465481#M118734</guid>
      <dc:creator>thesasuser</dc:creator>
      <dc:date>2018-05-28T14:05:31Z</dc:date>
    </item>
    <item>
      <title>Re: Imputing missing values of character variables in a large data set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465488#M118735</link>
      <description>&lt;P&gt;See a rather brute force example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* create example data */
data class;
set sashelp.class end=done;
output;
if done
then do;
  call missing(of _all_);
  output;
end;
run;

/* update  with largest freq */
proc sql;
create table upd (drop=count) as
select sex, count(*) as count from class
where sex ne ' '
group by sex
order by count descending;
create table want as
select name, coalesce(a.sex,b.sex) as sex, age, height, weight
from class a, upd (obs=1) b;
drop table upd;
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 28 May 2018 14:57:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465488#M118735</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-05-28T14:57:32Z</dc:date>
    </item>
    <item>
      <title>Re: Imputing missing values of character variables in a large data set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465489#M118736</link>
      <description>&lt;P&gt;Yes there are ways. Show us your data to get a code answer&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 28 May 2018 14:57:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465489#M118736</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2018-05-28T14:57:45Z</dc:date>
    </item>
    <item>
      <title>Re: Imputing missing values of character variables in a large data set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465490#M118737</link>
      <description>&lt;P&gt;What do you mean by &lt;STRONG&gt;values of observation with largest frequency,&lt;/STRONG&gt; Can you be more specific.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you want all the missing characters to be replaced by some other values then you can use arrays as following:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
set sashelp.class;
array char _Character_;
do over char;
if missing(char) then char="Some value";
end;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 28 May 2018 14:57:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465490#M118737</guid>
      <dc:creator>SuryaKiran</dc:creator>
      <dc:date>2018-05-28T14:57:53Z</dc:date>
    </item>
    <item>
      <title>Re: Imputing missing values of character variables in a large data set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465492#M118738</link>
      <description>&lt;P&gt;Here's a solution. I would imagine that there are plenty of better solutions. For example if there is an equal amount of the most common it will just pick one of them.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA have;
INPUT ID Imp $;
*---+----1----+----2----+----3----+----4----+----5;
CARDS;
1 A
2 A
3 .
4 D
5 D
6 D
;

Proc sql outobs=1;
create table mostcommon as
select t1.imp, count(t1.imp) AS Count
from work.have as t1
where t1.imp is not missing
group by t1.imp
order by CALCULATED count desc
;
quit;

Proc sql;
create table want as
select have.ID, COALESCEC(have.Imp,mc.Imp) as Imp
from work.have as have
left join work.mostcommon as mc on 1=1
;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 28 May 2018 14:59:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465492#M118738</guid>
      <dc:creator>DanielLangley</dc:creator>
      <dc:date>2018-05-28T14:59:03Z</dc:date>
    </item>
    <item>
      <title>Re: Imputing missing values of character variables in a large data set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465498#M118739</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA have;
infile datalines missover;
INPUT ID Imp $;
CARDS;
1 A
2 A
3
4 D
5 D
6 D
;

proc sql noprint;
select Imp INTO:Rep
from (select Imp,COUNT(Imp) as count
		from have
			Group by Imp)
Having count=max(Count)
;
quit;


data want;
set Have;
array char _Character_;
do over char;
if missing(char) then char="&amp;amp;Rep";
end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;NOTE: Character missing values are represented by blank, not period.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 28 May 2018 15:23:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465498#M118739</guid>
      <dc:creator>SuryaKiran</dc:creator>
      <dc:date>2018-05-28T15:23:33Z</dc:date>
    </item>
    <item>
      <title>Re: Imputing missing values of character variables in a large data set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465513#M118741</link>
      <description>&lt;P&gt;Thank you.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I didn't know about&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;&lt;SPAN class="token statement"&gt;infile&lt;/SPAN&gt; datalines missover&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 28 May 2018 17:35:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465513#M118741</guid>
      <dc:creator>DanielLangley</dc:creator>
      <dc:date>2018-05-28T17:35:42Z</dc:date>
    </item>
    <item>
      <title>Re: Imputing missing values of character variables in a large data set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465632#M118781</link>
      <description>&lt;P&gt;First, you have to find the most common values, e.g.:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;proc freq data=sashelp.class noprint order=freq;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; tables age /out=age;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; tables height/ out=height;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; tables sex/ out=sex;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;Then, the easiest may be to put the values in macro variables:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;data _null_;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; merge age sex height; /* with ORDER=FREQ, the first obs is the most common value */&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; call symputx('age',age);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; call symputx('height',height);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; call symputx('sex',sex);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; stop;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;Given this test data:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;data test;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; set sashelp.class;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; if _N_&amp;gt;8 then&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; &amp;nbsp; call missing(height,age);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; if name =:'L' then&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; &amp;nbsp; sex=' ';&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;We can now fill in the blanks:&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;data want;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; set test;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; age=coalesce(age,&amp;amp;age);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; height=coalesce(height,&amp;amp;height);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp; sex=coalescec(sex,"&amp;amp;sex");&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 29 May 2018 08:46:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/465632#M118781</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2018-05-29T08:46:52Z</dc:date>
    </item>
    <item>
      <title>Re: Imputing missing values of character variables in a large data set</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/466399#M119001</link>
      <description>&lt;P&gt;Thank you all for the response.&lt;/P&gt;
&lt;P&gt;All solutions were good . But as I understand it I can accept one as the solution.&lt;/P&gt;
&lt;P&gt;So selecting the first one.&lt;/P&gt;
&lt;P&gt;Thanks again&lt;/P&gt;</description>
      <pubDate>Thu, 31 May 2018 11:22:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Imputing-missing-values-of-character-variables-in-a-large-data/m-p/466399#M119001</guid>
      <dc:creator>thesasuser</dc:creator>
      <dc:date>2018-05-31T11:22:54Z</dc:date>
    </item>
  </channel>
</rss>

