<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Find char missing values in a dataset and remove the features with more than 10% of missing valu in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593784#M170482</link>
    <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
infile cards dlm = "|";
input X : $ 10. Y : $ 10. Z;
cards;
High  |Yes|2.56
      |No |3.67
Medium|Yes|.
Low   |Yes|2
      |Yes|.
High  |Yes|2.56
      |No |3.67
Medium|Yes|.
Low   |Yes|2
      |   |.
High  |Yes|2.56
      |No |3.67
Medium|Yes|.
Low   |Yes|2
      |Yes|.
;
run;


proc transpose data=have(obs=0) out=temp;
var _all_;
run;

proc sql;
select cats('nmiss(',_name_ ,')/count(*) as p_',_name_) into : list separated by ','
 from temp;

create table percent_missing as
select &amp;amp;list
 from have ;
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Thu, 03 Oct 2019 15:45:54 GMT</pubDate>
    <dc:creator>Ksharp</dc:creator>
    <dc:date>2019-10-03T15:45:54Z</dc:date>
    <item>
      <title>Find char missing values in a dataset and remove the features with more than 10% of missing values</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/592951#M170084</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to find out the missing values in each feature of the dataset and remove the features which have greater than 10% missing values.&lt;/P&gt;&lt;P&gt;Any help would be appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2019 09:06:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/592951#M170084</guid>
      <dc:creator>Bond007</dc:creator>
      <dc:date>2019-10-01T09:06:32Z</dc:date>
    </item>
    <item>
      <title>Re: Find char missing values in a dataset and remove the features with more than 10% of missing valu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/592954#M170087</link>
      <description>&lt;P&gt;Hi and welcome to the SAS Community! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What is a 'feature' in a data set? Are you able to give a small example of what your data looks like and what you want the desired result to look like? Makes it much easier to provide a usable code answer.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2019 09:13:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/592954#M170087</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2019-10-01T09:13:09Z</dc:date>
    </item>
    <item>
      <title>Re: Find char missing values in a dataset and remove the features with more than 10% of missing valu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/592973#M170101</link>
      <description>&lt;P&gt;The dataset has different numerical features and categorical features.&lt;/P&gt;&lt;P&gt;For example:&lt;/P&gt;&lt;P&gt;X&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Y&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Z&lt;/P&gt;&lt;P&gt;High&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Yes&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2.56&lt;/P&gt;&lt;P&gt;‘ ‘&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; No&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3.67&lt;/P&gt;&lt;P&gt;Medium&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ‘ ‘&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .&lt;/P&gt;&lt;P&gt;Low&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ‘ ‘&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;/P&gt;&lt;P&gt;‘ ‘&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ‘ ‘ &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; .&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;and so on....&lt;/P&gt;&lt;P&gt;X,Y,Z might be 3 features of this dataset, where X and Y are Categorical features. I want to find missing values for X and for Y and then remove X or Y if the missing values are greater than 10% in either of them.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I used this..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;/*create format for missing*/&lt;BR /&gt;proc format;&lt;BR /&gt;value $missfmt ' '='Missing' other='Not Missing';&lt;BR /&gt;*value missfmt . ='Missing' other='Not Missing';&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;/*find out the missing char values*/&lt;BR /&gt;proc freq data = trainingdata;&lt;BR /&gt;format _char_ $missfmt.;&lt;BR /&gt;tables _char_ / missing nocum nopercent out=tempstr;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am able to see missing and not missing count for each feature, but I am not able to fetch the missing count from this from this, so that I can divide from the total number of values and get the percentage.&lt;/P&gt;&lt;P&gt;I am performing the numerical and categorical missing values search separately as proc means didn't work for tables=_char_&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2019 10:22:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/592973#M170101</guid>
      <dc:creator>Bond007</dc:creator>
      <dc:date>2019-10-01T10:22:05Z</dc:date>
    </item>
    <item>
      <title>Re: Find char missing values in a dataset and remove the features with more than 10% of missing valu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/592978#M170105</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/293144"&gt;@Bond007&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you need the percent values then what about removing keyword NOPERCENT from your code?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2019 10:51:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/592978#M170105</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-10-01T10:51:29Z</dc:date>
    </item>
    <item>
      <title>Re: Find char missing values in a dataset and remove the features with more than 10% of missing valu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/592989#M170112</link>
      <description>&lt;P&gt;I have removed the nopercent from the code.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The output pattern is as follows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; The SAS System&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;The FREQ Procedure&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X&lt;/P&gt;&lt;P&gt;X&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Frequency&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Percent&lt;/P&gt;&lt;P&gt;Not missing&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1000&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; equivalent percentage value&lt;/P&gt;&lt;P&gt;Missing&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 900&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; equivalent percentage value&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Y&lt;/P&gt;&lt;P&gt;Y&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Frequency&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Percent&lt;/P&gt;&lt;P&gt;Not missing&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 800&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; equivalent percentage value&lt;/P&gt;&lt;P&gt;Missing&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1100&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; equivalent percentage value&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;and so on.....&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to fetch the percentage of missing values for X and Y and then remove the features which has greater than 10% of missing values.&lt;/P&gt;&lt;P&gt;I tried to transpose the output dataset, to get a new dataset with an assumption to see a output like this,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Missing&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Not missing&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Percent&lt;/P&gt;&lt;P&gt;X&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 900&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1000&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;....&lt;/P&gt;&lt;P&gt;Y&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1100&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 800&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ....&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But it doesn't work.&lt;/P&gt;&lt;P&gt;I want to fetch missing percent for X and missing percent for Y from the output and then check if it is greater than 10%.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2019 11:24:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/592989#M170112</guid>
      <dc:creator>Bond007</dc:creator>
      <dc:date>2019-10-01T11:24:44Z</dc:date>
    </item>
    <item>
      <title>Re: Find char missing values in a dataset and remove the features with more than 10% of missing valu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593006#M170116</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/293144"&gt;@Bond007&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;do you mean something like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
infile cards dlm = "|";
input X : $ 10. Y : $ 10. Z;
cards;
High  |Yes|2.56
      |No |3.67
Medium|Yes|.
Low   |Yes|2
      |Yes|.
High  |Yes|2.56
      |No |3.67
Medium|Yes|.
Low   |Yes|2
      |   |.
High  |Yes|2.56
      |No |3.67
Medium|Yes|.
Low   |Yes|2
      |Yes|.
;
run;


data _null_;
  sentinel1 = .;
  if 0 then set have;
  sentinel2=.;
  array _C_{*} sentinel1-character-sentinel2;
  call symputX("_DIM_",dim(_C_),"G");  /* collect the metadata */
  stop;
run;

options symbolgen;
data _null_;

  sentinel1 = .;
  if 0 then set have;
  sentinel2=.;

  array _C_ sentinel1-character-sentinel2;
  array CNT[&amp;amp;_DIM_.] _temporary_;

  d=dim(_C_);
  put d=;
  do until (eof);
    set have end=eof nobs=nobs;
    do over _C_;
      if MISSING (_C_) then CNT[_I_] + 1;
    end;
  end;

  call execute('data want; set have; drop _N_');
  do over _C_;
    if CNT[_I_]/NOBS &amp;gt; 0.10 then call execute(vname(_C_)); /*Y stays, X drops */
  end;
  call execute('; run;');
  stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All the best&lt;/P&gt;&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2019 12:48:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593006#M170116</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2019-10-01T12:48:25Z</dc:date>
    </item>
    <item>
      <title>Re: Find char missing values in a dataset and remove the features with more than 10% of missing valu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593038#M170127</link>
      <description>&lt;P&gt;So it seems you have coded your "features" into SAS variables. And you already know how to count the number of missing values. To see if that number is more or less than 20% you just need to count how many observations there are and divide.&amp;nbsp; Since you have both the number of missing and non-missing the total number is just the sum of those two numbers.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Oct 2019 14:03:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593038#M170127</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-10-01T14:03:27Z</dc:date>
    </item>
    <item>
      <title>Re: Find char missing values in a dataset and remove the features with more than 10% of missing valu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593784#M170482</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
infile cards dlm = "|";
input X : $ 10. Y : $ 10. Z;
cards;
High  |Yes|2.56
      |No |3.67
Medium|Yes|.
Low   |Yes|2
      |Yes|.
High  |Yes|2.56
      |No |3.67
Medium|Yes|.
Low   |Yes|2
      |   |.
High  |Yes|2.56
      |No |3.67
Medium|Yes|.
Low   |Yes|2
      |Yes|.
;
run;


proc transpose data=have(obs=0) out=temp;
var _all_;
run;

proc sql;
select cats('nmiss(',_name_ ,')/count(*) as p_',_name_) into : list separated by ','
 from temp;

create table percent_missing as
select &amp;amp;list
 from have ;
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 03 Oct 2019 15:45:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593784#M170482</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2019-10-03T15:45:54Z</dc:date>
    </item>
    <item>
      <title>Re: Find char missing values in a dataset and remove the features with more than 10% of missing valu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593830#M170499</link>
      <description>&lt;P&gt;Use the CLASS statement in PROC SUMMARY.&amp;nbsp; Should work as long as the number of variables is not too many.&lt;/P&gt;
&lt;P&gt;So define your two formats:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc format;
  value $missfmt ' '='Missing' other='Not Missing';
  value missfmt low-high='Not Missing' other ='Missing' ;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Pick your dataset.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let dsn=sashelp.cars;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Get the list of variables:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc transpose data=&amp;amp;dsn(obs=0) out=step1;
  var _all_;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Summarize&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc summary data = &amp;amp;dsn missing chartype ;
  class _all_ ;
  ways 1;
  output out=step2 ;
  format _char_ $missfmt. _numeric_ missfmt.;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Then collapse to one observation per variable.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  length varnum 8 _name_ $32. _label_ $256 missing non_missing percent_missing 8;
  keep varnum -- percent_missing;
  do until(last._type_);
    set step2;
    by _type_;
    varnum=indexc(_type_,'1');
    p=varnum;
    set step1 point=p;
    if vvaluex(_name_)='Missing' then missing=_freq_;
    else non_missing=_freq_;
  end;
  missing=sum(0,missing);
  non_missing=sum(0,non_missing);
  percent_missing=missing/(sum(missing,non_missing));
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Note variable names will come out in reverse order that they exist in the dataset.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=want; by varnum; run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;                                                                non_       percent_
Obs    varnum    _name_         _label_            missing    missing       missing

  1       1      Make                                 0         428      .000000000
  2       2      Model                                0         428      .000000000
  3       3      Type                                 0         428      .000000000
  4       4      Origin                               0         428      .000000000
  5       5      DriveTrain                           0         428      .000000000
  6       6      MSRP                                 0         428      .000000000
  7       7      Invoice                              0         428      .000000000
  8       8      EngineSize     Engine Size (L)       0         428      .000000000
  9       9      Cylinders                            2         426      .004672897
 10      10      Horsepower                           0         428      .000000000
 11      11      MPG_City       MPG (City)            0         428      .000000000
 12      12      MPG_Highway    MPG (Highway)         0         428      .000000000
 13      13      Weight         Weight (LBS)          0         428      .000000000
 14      14      Wheelbase      Wheelbase (IN)        0         428      .000000000
 15      15      Length         Length (IN)           0         428      .000000000

&lt;/PRE&gt;</description>
      <pubDate>Thu, 03 Oct 2019 17:36:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593830#M170499</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-10-03T17:36:02Z</dc:date>
    </item>
    <item>
      <title>Re: Find char missing values in a dataset and remove the features with more than 10% of missing valu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593861#M170521</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/35763"&gt;@yabwon&lt;/a&gt;:&lt;/P&gt;
&lt;P&gt;Barteku, as a matter of dynamic hash programming curiosity, the entire feat can be pulled in a single step without crossing a step boundary once:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have ;                                                                                                                                                                                                                                                     
  input (X Y) ($) Z ;                                                                                                                                                                                                                                           
  cards ;                                                                                                                                                                                                                                                       
High   Yes 2.56                                                                                                                                                                                                                                                 
.      No  3.67                                                                                                                                                                                                                                                 
Medium Yes  .                                                                                                                                                                                                                                                   
Low    Yes 2                                                                                                                                                                                                                                                    
.      Yes  .                                                                                                                                                                                                                                                   
High   Yes 2.56                                                                                                                                                                                                                                                 
.      No  3.67                                                                                                                                                                                                                                                 
Medium Yes  .                                                                                                                                                                                                                                                   
Low    Yes 2                                                                                                                                                                                                                                                    
.      .    .                                                                                                                                                                                                                                                   
High   Yes 2.56                                                                                                                                                                                                                                                 
.      No  3.67                                                                                                                                                                                                                                                 
Medium Yes  .                                                                                                                                                                                                                                                   
Low    Yes 2                                                                                                                                                                                                                                                    
.      Yes  .                                                                                                                                                                                                                                                   
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
%let pct = 10 ;                                                                                                                                                                                                                                                 
                                                                                                                                                                                                                                                                
data _null_ ;                                                                                                                                                                                                                                                   
  dcl hash v () ;                                                                                                                                                                                                                                               
  v.definekey ("vn") ;                                                                                                                                                                                                                                          
  v.definedata ("vn", "nmiss") ;                                                                                                                                                                                                                                
  v.definedone () ;                                                                                                                                                                                                                                             
  dcl hiter iv ("v") ;                                                                                                                                                                                                                                           
  do until (lr) ;                                                                                                                                                                                                                                               
    set have end = lr nobs = n ;                                                                                                                                                                                                                                
    array nn _numeric_ ;                                                                                                                                                                                                                                        
    array cc _char_ ;                                                                                                                                                                                                                                           
    do over nn ;                                                                                                                                                                                                                                                
      if not cmiss (nn) then continue ;                                                                                                                                                                                                                         
      vn = put (vname (nn), $32.) ;                                                                                                                                                                                                                             
      link count ;                                                                                                                                                                                                                                              
    end ;                                                                                                                                                                                                                                                       
    do over cc ;                                                                                                                                                                                                                                                
      if not cmiss (cc) then continue ;                                                                                                                                                                                                                         
      vn = put (vname (cc), $32.) ;                                                                                                                                                                                                                             
      link count ;                                                                                                                                                                                                                                              
    end ;                                                                                                                                                                                                                                                       
  end ;                                                                                                                                                                                                                                                         
  dcl hash h (dataset:"have", multidata:"y") ;                                                                                                                                                                                                                  
  do while (iv.next() = 0) ;                                                                                                                                                                                                                                     
    if divide (nmiss, n) * 100 &amp;lt; &amp;amp;pct then h.definekey (vn) ;                                                                                                                                                                                                   
  end ;                                                                                                                                                                                                                                                         
  h.definedone() ;                                                                                                                                                                                                                                              
  h.output (dataset: "want") ;                                                                                                                                                                                                                                  
  stop ;                                                                                                                                                                                                                                                        
  count: if v.find() then nmiss = 1 ;                                                                                                                                                                                                                      
         else             nmiss + 1 ;                                                                                                                                                                                                                      
         v.replace() ;                                                                                                                                                                                                                                          
run ;                                                   
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Oct 2019 19:18:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593861#M170521</guid>
      <dc:creator>hashman</dc:creator>
      <dc:date>2019-10-03T19:18:49Z</dc:date>
    </item>
    <item>
      <title>Re: Find char missing values in a dataset and remove the features with more than 10% of missing valu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593877#M170529</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/21262"&gt;@hashman&lt;/a&gt;&amp;nbsp;:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Paul, 100% agree, hash table give us elegant "one step" solution. The only reason I've made it 2 datasteps was to get the&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;CODE&gt;&amp;amp;_DIM_.&lt;/CODE&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;macrovariable. In fact since all loops are "over _C_" I could make it a static value and it would work well in one datastep to, like:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;array CNT[1000000] _temporary_; /* ~8MB of RAM, for quite "wide" dataset ;) */&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;but I thought it wouldn't be "nice/elegant".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But, as you remember our discussion from SAS-L, the one titled "Array search concepts (was: Re: Help?)", I'm finishing/polishing this dynamic array concept, which would allow us to write something like:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;...
 array _C_ sentinel1-character-sentinel2;
 call arrayCNT('Allocate', lbound(_C_), hbound(_C_));
...
 do over _C_;
  if MISSING (_C_) then call arrayCNT('Add', _I_, 1);
 end;
...&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;All the best&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Bart&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Oct 2019 19:45:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593877#M170529</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2019-10-03T19:45:42Z</dc:date>
    </item>
    <item>
      <title>Re: Find char missing values in a dataset and remove the features with more than 10% of missing valu</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593975#M170583</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/35763"&gt;@yabwon&lt;/a&gt;:&lt;/P&gt;
&lt;P&gt;Bart, as far as elegance is concerned, I have the same kind of aesthetic aversion to "big enough" allocations. I do remember our -L exchange and am looking forward to seeing the finished product whenever you deem it to have been sufficiently Polished ;). I've learned quite a few FCMP tricks of trade from you and strongly suspect will learn more from this one.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Aesthetics aside, "big enough" can be extremely useful, as you well know, and quite productive speed-wise. This is because a one fell swoop allocation at compile time works much faster than allocating memory at run time for each extra item one at a time. With a really big hash object table, most of its time is spent not so much on the search and retrieve operations as on looking for available memory before doing an insert, especially when its memory footprint approaches the system limits. As a result, code can be run for a long time, only to fail on the next insertion when no more memory is available; whereas if an open-addressed array-based hash table is allocated as a big array at compile time, it either fails at once or else will work to the end without a glitch.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As you've seen from my -L writings, with a single integer key&amp;nbsp;an open-addressed array-based hash table wins hands down both in insert/search speed (~1:2) and memory usage (~1:3 even with a half-sparse table). It also excels at aggregation because it can be done directly in the corresponding array cells (rather than via the FIND-aggregate-REPLACE hash object cycle). This is because the hash function in this case is the simple mod(key,table_size). The hash object readily takes over in terms of search speed for character keys and composite keys of mixed types since against such keys, its internal hash function is much faster than what can be mustered via:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;mod (input (md5 (catx (k1-kN), pib6.), table_size)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;externally - this combo just takes too long to compute, and a slow hash function defeats the purpose of using hashing in the first place. If SAS surfaced a really tight and fast function that could be used against an arbitrary key, array-based hashing would be much more attractive.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;p.s. If you want to see how a hash object table fares against an open-addressed array table with a single integer key, try the program in the attached file.&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Oct 2019 22:36:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-char-missing-values-in-a-dataset-and-remove-the-features/m-p/593975#M170583</guid>
      <dc:creator>hashman</dc:creator>
      <dc:date>2019-10-03T22:36:18Z</dc:date>
    </item>
  </channel>
</rss>

