<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Basic data quality check -- how to determine percent missing for key variables? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700476#M214374</link>
    <description>&lt;P&gt;I accepted FreelanceReinhard's solution because it's simple and easy to apply, and it serves my purposes quite well for this particular data.&amp;nbsp; But I will make a note of Reeza's macro, as I anticipate it could be useful in the future for larger/more complex datasets.&amp;nbsp; Thanks!&lt;/P&gt;</description>
    <pubDate>Fri, 20 Nov 2020 14:28:11 GMT</pubDate>
    <dc:creator>Wolverine</dc:creator>
    <dc:date>2020-11-20T14:28:11Z</dc:date>
    <item>
      <title>Basic data quality check -- how to determine percent missing for key variables?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700258#M214289</link>
      <description>&lt;P&gt;This seems like it should be quite easy to do, but I haven't found a way yet.&amp;nbsp; I have a large file (millions of records) with names, dates of birth, addresses, etc, and I want to know the missing/non-missing percentages for each of these variables.&amp;nbsp; I don't want to run proc freq, as this will give me a huge list of every name in the dataset.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My work-around so far has been to recode these types of variables into dichotomous variables (ie, 0=missing and 1=non-missing), and then run freqs.&amp;nbsp; But that's time consuming and tedious.&amp;nbsp; Isn't there an option in proc freq or another procedure to do this automatically?&lt;/P&gt;</description>
      <pubDate>Thu, 19 Nov 2020 16:44:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700258#M214289</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2020-11-19T16:44:07Z</dc:date>
    </item>
    <item>
      <title>Re: Basic data quality check -- how to determine percent missing for key variables?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700263#M214290</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/43822"&gt;@Wolverine&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/43822"&gt;@Wolverine&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I don't want to run proc freq, as this will give me a huge list of every name in the dataset.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;This doesn't happen if you apply a suitable &lt;EM&gt;format&lt;/EM&gt; to the variables in question such as&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc format;
value miss
._-.z = 'missing'
other = 'non-missing';

value $miss
' ' = 'missing'
other = 'non-missing';
run;

proc freq data=sashelp.heart;
format _numeric_ miss. _char_ $miss.;
tables _all_ / missing;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 19 Nov 2020 16:59:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700263#M214290</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2020-11-19T16:59:04Z</dc:date>
    </item>
    <item>
      <title>Re: Basic data quality check -- how to determine percent missing for key variables?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700266#M214291</link>
      <description>&lt;P&gt;Here's a variant on the program from&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32733"&gt;@FreelanceReinh&lt;/a&gt;&amp;nbsp;that cleans up the display into a few nice options.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/*This program creates a report with the number and percent of
missing data for each variable in the data set.
The ony change should be to the macro variable, INPUT_DSN.
Author: F. Khurshed
Date: 2019-01-04
*/
*create sample data to work with;

data class;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;

*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;

proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&amp;amp;INPUT_DSN.;
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

proc sort data=long;
    by variable;
run;

*make it a wide data set for presentation, with values as N (Percent);

proc transpose data=long out=wide_presentation (drop=_name_);
    by variable;
    id variable_value;
    var presentation;
run;

*transpose only N;

proc transpose data=long out=wide_N prefix=N_;
    by variable;
    id variable_value;
    var frequency;
run;

*transpose only percents;

proc transpose data=long out=wide_PCT prefix=PCT_;
    by variable;
    id variable_value;
    var percent;
run;

*final output file;

data &amp;amp;Output_DSN.;
    merge wide_N wide_PCT wide_presentation;
    by variable;
    drop _name_;
    label N_Missing='# Missing' N_Not_Missing='# Not Missing' 
        PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing' 
        Not_missing='Not Missing';
run;

title "Missing Report of &amp;amp;INPUT_DSN.";

proc print data=&amp;amp;output_dsn. noobs label;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Change the macro variables in the code and re-run it on your own data set to get your own results.&lt;/P&gt;
&lt;P&gt;It should be pretty fast.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;*change these to what you need in code above;
%let INPUT_DSN = class; 
%let OUTPUT_DSN = want;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/43822"&gt;@Wolverine&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;This seems like it should be quite easy to do, but I haven't found a way yet.&amp;nbsp; I have a large file (millions of records) with names, dates of birth, addresses, etc, and I want to know the missing/non-missing percentages for each of these variables.&amp;nbsp; I don't want to run proc freq, as this will give me a huge list of every name in the dataset.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My work-around so far has been to recode these types of variables into dichotomous variables (ie, 0=missing and 1=non-missing), and then run freqs.&amp;nbsp; But that's time consuming and tedious.&amp;nbsp; Isn't there an option in proc freq or another procedure to do this automatically?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Nov 2020 17:05:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700266#M214291</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2020-11-19T17:05:36Z</dc:date>
    </item>
    <item>
      <title>Re: Basic data quality check -- how to determine percent missing for key variables?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700459#M214361</link>
      <description>&lt;PRE&gt;%let lib=sashelp;
%let dsn=heart;



data _null_;
 set sashelp.vcolumn(where=(libname=upcase("&amp;amp;lib") and memname=upcase("&amp;amp;dsn"))) end=last;
 if _n_=1 then call execute('proc sql;create table want as select ');
 call execute(cat("nmiss(",name,")/(select count(*) from &amp;amp;lib..&amp;amp;dsn ) as ",name," format=percent8.2"));
 if last then call execute("from &amp;amp;lib..&amp;amp;dsn ;quit;" );
  else call execute(',');
run;

&lt;/PRE&gt;</description>
      <pubDate>Fri, 20 Nov 2020 13:07:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700459#M214361</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2020-11-20T13:07:30Z</dc:date>
    </item>
    <item>
      <title>Re: Basic data quality check -- how to determine percent missing for key variables?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700476#M214374</link>
      <description>&lt;P&gt;I accepted FreelanceReinhard's solution because it's simple and easy to apply, and it serves my purposes quite well for this particular data.&amp;nbsp; But I will make a note of Reeza's macro, as I anticipate it could be useful in the future for larger/more complex datasets.&amp;nbsp; Thanks!&lt;/P&gt;</description>
      <pubDate>Fri, 20 Nov 2020 14:28:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700476#M214374</guid>
      <dc:creator>Wolverine</dc:creator>
      <dc:date>2020-11-20T14:28:11Z</dc:date>
    </item>
    <item>
      <title>Re: Basic data quality check -- how to determine percent missing for key variables?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700556#M214400</link>
      <description>Just a note, that my code is not a macro.</description>
      <pubDate>Fri, 20 Nov 2020 16:48:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Basic-data-quality-check-how-to-determine-percent-missing-for/m-p/700556#M214400</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2020-11-20T16:48:12Z</dc:date>
    </item>
  </channel>
</rss>

