<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Select Observations at Random for Data Checking in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Select-Observations-at-Random-for-Data-Checking/m-p/289374#M59762</link>
    <description>&lt;P&gt;If you want a random sample another way would be to use proc surveyselect. The parameter samprate is very easy to use to select a percentage of records or sampsize to select a specific number of records.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Depending on the number of records and what conditions may be involved for checking you might use code looking for specific things.&lt;/P&gt;
&lt;P&gt;Suppose you have a variable that should never have a value greater than 10:&lt;/P&gt;
&lt;P&gt;Proc sql;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; select count(*)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; from dataset&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; where variablex &amp;gt; 10;&lt;/P&gt;
&lt;P&gt;quit;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Would tell you how many records have an invalid value.&lt;/P&gt;
&lt;P&gt;Conditions could be multiple such as variablex&amp;gt;5 and missing(variabley) if variabley should have a value whenever variablex is greater than 5.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have multiple variables that should have the same range of non-missing values then custom formats may help. Suppose I have some variables that should only have values of 1, 2, 3, 4, 5 ( 1 to 5 scale) or 9 to indicate no opinion in a typical survey question.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Proc format;&lt;/P&gt;
&lt;P&gt;value validscale&lt;/P&gt;
&lt;P&gt;1, 2, 3, 4, 5,9='Valid'&lt;/P&gt;
&lt;P&gt;other='Invalid'&lt;/P&gt;
&lt;P&gt;;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc freq data=have;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; tables &amp;lt;list the variables with that code&amp;gt;;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; format &amp;lt;same variables&amp;gt; validscale. ;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;would give you tables with counts of invalid values.&lt;/P&gt;
&lt;P&gt;Unless the data is extremely large then you are validating all records not just a sample for those variables.&lt;/P&gt;</description>
    <pubDate>Wed, 03 Aug 2016 21:58:08 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2016-08-03T21:58:08Z</dc:date>
    <item>
      <title>Select Observations at Random for Data Checking</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-Observations-at-Random-for-Data-Checking/m-p/289362#M59759</link>
      <description>&lt;P&gt;Is there a simple way to select a number of observations at random when doing data checks?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have been using PROC PRINT w/ the FIRSTOBS and OBS options, but there must be a way to select a number of observations at random.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 20:46:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-Observations-at-Random-for-Data-Checking/m-p/289362#M59759</guid>
      <dc:creator>_maldini_</dc:creator>
      <dc:date>2016-08-03T20:46:19Z</dc:date>
    </item>
    <item>
      <title>Re: Select Observations at Random for Data Checking</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-Observations-at-Random-for-Data-Checking/m-p/289363#M59760</link>
      <description>&lt;P&gt;The approach that I have used is to create a new variable in the data set based on the RANUNI funciton and then use that to make a pseudo-random selection of a prorportion of the records for audit.&amp;nbsp; I'm sure you could set up something similar in PROC SQL so you didn't have to do multiple passes of the data.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 21:01:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-Observations-at-Random-for-Data-Checking/m-p/289363#M59760</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2016-08-03T21:01:34Z</dc:date>
    </item>
    <item>
      <title>Re: Select Observations at Random for Data Checking</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-Observations-at-Random-for-Data-Checking/m-p/289374#M59762</link>
      <description>&lt;P&gt;If you want a random sample another way would be to use proc surveyselect. The parameter samprate is very easy to use to select a percentage of records or sampsize to select a specific number of records.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Depending on the number of records and what conditions may be involved for checking you might use code looking for specific things.&lt;/P&gt;
&lt;P&gt;Suppose you have a variable that should never have a value greater than 10:&lt;/P&gt;
&lt;P&gt;Proc sql;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; select count(*)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; from dataset&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; where variablex &amp;gt; 10;&lt;/P&gt;
&lt;P&gt;quit;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Would tell you how many records have an invalid value.&lt;/P&gt;
&lt;P&gt;Conditions could be multiple such as variablex&amp;gt;5 and missing(variabley) if variabley should have a value whenever variablex is greater than 5.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have multiple variables that should have the same range of non-missing values then custom formats may help. Suppose I have some variables that should only have values of 1, 2, 3, 4, 5 ( 1 to 5 scale) or 9 to indicate no opinion in a typical survey question.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Proc format;&lt;/P&gt;
&lt;P&gt;value validscale&lt;/P&gt;
&lt;P&gt;1, 2, 3, 4, 5,9='Valid'&lt;/P&gt;
&lt;P&gt;other='Invalid'&lt;/P&gt;
&lt;P&gt;;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc freq data=have;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; tables &amp;lt;list the variables with that code&amp;gt;;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; format &amp;lt;same variables&amp;gt; validscale. ;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;would give you tables with counts of invalid values.&lt;/P&gt;
&lt;P&gt;Unless the data is extremely large then you are validating all records not just a sample for those variables.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 21:58:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-Observations-at-Random-for-Data-Checking/m-p/289374#M59762</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-08-03T21:58:08Z</dc:date>
    </item>
  </channel>
</rss>

