<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Finding incomplete/partial data in a dataset in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795691#M32947</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/410002"&gt;@TP055972&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hi there!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What I mean by incomplete/partial is a data entry has been made but the data entry doesn't contain the full text.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example a data entry to Column City has been made and the user wants to enter Chicago(Example) but in the dataset shows Chi.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there a way to identify this type of data in Sas Viya or Sas studio without going through the entire the rows of data?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If I did not create the SAS data set I run Proc Contents to get properties of the variables as they exist. If you look at the contents and your variable has length of 4 and is character you know that NONE of the values will contain more than 4 characters. If you expect them to that is an indicator that how the data set was built is incorrect. Compare the results of Proc Contents with the description you were given of the data&amp;nbsp; (you do have some description of what should be in the data correct?).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If the properties are reasonable then thing I do is &lt;STRONG&gt;explore&lt;/STRONG&gt; the data. That typically means running summary statistics on numeric variables like price, height, weight, instrument measurements (NOT account number, phone number or Id numbers) to get maximum, minimum, mean values and number of values (lots of missing being of possible interest) and frequencies on text variables to get a feel for what is in the data regardless of what the documentation says should be ther.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Proc freq will give you a count of each value that appears in the data for each variable requested so I often start there for the character variables. So you can see 1) all the values of a variable that by default will be sorted by the value (so similar values should appear together such as "Chi" "Chicago" "Chic. (Example)" or what have you. If those values look "partial" the next steps likely depend on the exact values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Basic code to get frequencies of all the character variables.&lt;/P&gt;
&lt;P&gt;Proc Freq data=yourdatasetname;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; tables _character_;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;If there are some that don't want to look at you can use a data set option to drop them from consideration:&lt;/P&gt;
&lt;P&gt;Proc freq data=yourdatasetname (drop=thisvar thatvar someothervarname);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; tables _character_;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once you identify suspect values you can subset your data to examine more details with code similar to:&lt;/P&gt;
&lt;PRE&gt;data detail;
   set yourdatasetname;
   where City in ('Chi' 'Chi.' 'Chgago' 'chi' );
run;&lt;/PRE&gt;
&lt;P&gt;Pick your variable and the IN operator finds &lt;STRONG&gt;exact&lt;/STRONG&gt; matches (from your Proc Freq output) and will have the entire record so you would have all the information in the set associated with the suspect values.&lt;/P&gt;
&lt;P&gt;What you do after that depends.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note: SAS datasets have Variables not columns.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 11 Feb 2022 15:44:06 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2022-02-11T15:44:06Z</dc:date>
    <item>
      <title>Finding incomplete/partial data in a dataset</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795639#M32936</link>
      <description>&lt;P&gt;Hello there!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am currently a student working on a dataset and one question I have is there a way to identify incomplete/partial data in a dataset?&lt;/P&gt;&lt;P&gt;Looking at the rows 1 by 1 isn't ideal as the dataset has over 900,000 rows.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for time.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Feb 2022 12:25:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795639#M32936</guid>
      <dc:creator>TP055972</dc:creator>
      <dc:date>2022-02-11T12:25:51Z</dc:date>
    </item>
    <item>
      <title>Re: Finding incomplete/partial data in a dataset</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795640#M32937</link>
      <description>&lt;P&gt;What do you mean by&amp;nbsp;&lt;SPAN&gt;incomplete/partial data?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Can you provide some sample data? Makes it much easier to help you.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Feb 2022 12:30:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795640#M32937</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2022-02-11T12:30:15Z</dc:date>
    </item>
    <item>
      <title>Re: Finding incomplete/partial data in a dataset</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795641#M32938</link>
      <description>&lt;P&gt;Hi there!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I mean by incomplete/partial is a data entry has been made but the data entry doesn't contain the full text.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;For example a data entry to Column City has been made and the user wants to enter Chicago(Example) but in the dataset shows Chi.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a way to identify this type of data in Sas Viya or Sas studio without going through the entire the rows of data?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Feb 2022 12:36:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795641#M32938</guid>
      <dc:creator>TP055972</dc:creator>
      <dc:date>2022-02-11T12:36:42Z</dc:date>
    </item>
    <item>
      <title>Re: Finding incomplete/partial data in a dataset</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795685#M32945</link>
      <description>&lt;P&gt;You will need to define what "incomplete / partial" means for every variable.&lt;/P&gt;
&lt;P&gt;Do you have a list of valid entries for the variable city? Here's an idea using sas code:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
   length City $ 25;
   input City $25.;
   datalines;
Boston
Chic
Detroit
L.A.
;

proc format;
   invalue ValidCities (upcase)
      'BOSTON',
      'CHICAGO',
      'DETROIT',
      'LOS ANGELES',
      'MIAMI' = 1
      other = 0
   ;
run;
   
   
data invalid;
   set have;
   
   if input(upcase(City), ValidCities.) = 0;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The format ValidCities can be created by importing list.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Maybe there are easier methods with Viya, but i don't think some magic method exists to validate all variables.&lt;/P&gt;</description>
      <pubDate>Fri, 11 Feb 2022 15:02:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795685#M32945</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2022-02-11T15:02:53Z</dc:date>
    </item>
    <item>
      <title>Re: Finding incomplete/partial data in a dataset</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795690#M32946</link>
      <description>HI there!&lt;BR /&gt;&lt;BR /&gt;Alright will try out this method&lt;BR /&gt;&lt;BR /&gt;Thank you for your time&lt;BR /&gt;</description>
      <pubDate>Fri, 11 Feb 2022 15:42:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795690#M32946</guid>
      <dc:creator>TP055972</dc:creator>
      <dc:date>2022-02-11T15:42:21Z</dc:date>
    </item>
    <item>
      <title>Re: Finding incomplete/partial data in a dataset</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795691#M32947</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/410002"&gt;@TP055972&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hi there!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What I mean by incomplete/partial is a data entry has been made but the data entry doesn't contain the full text.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example a data entry to Column City has been made and the user wants to enter Chicago(Example) but in the dataset shows Chi.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there a way to identify this type of data in Sas Viya or Sas studio without going through the entire the rows of data?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If I did not create the SAS data set I run Proc Contents to get properties of the variables as they exist. If you look at the contents and your variable has length of 4 and is character you know that NONE of the values will contain more than 4 characters. If you expect them to that is an indicator that how the data set was built is incorrect. Compare the results of Proc Contents with the description you were given of the data&amp;nbsp; (you do have some description of what should be in the data correct?).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If the properties are reasonable then thing I do is &lt;STRONG&gt;explore&lt;/STRONG&gt; the data. That typically means running summary statistics on numeric variables like price, height, weight, instrument measurements (NOT account number, phone number or Id numbers) to get maximum, minimum, mean values and number of values (lots of missing being of possible interest) and frequencies on text variables to get a feel for what is in the data regardless of what the documentation says should be ther.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Proc freq will give you a count of each value that appears in the data for each variable requested so I often start there for the character variables. So you can see 1) all the values of a variable that by default will be sorted by the value (so similar values should appear together such as "Chi" "Chicago" "Chic. (Example)" or what have you. If those values look "partial" the next steps likely depend on the exact values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Basic code to get frequencies of all the character variables.&lt;/P&gt;
&lt;P&gt;Proc Freq data=yourdatasetname;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; tables _character_;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;If there are some that don't want to look at you can use a data set option to drop them from consideration:&lt;/P&gt;
&lt;P&gt;Proc freq data=yourdatasetname (drop=thisvar thatvar someothervarname);&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; tables _character_;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once you identify suspect values you can subset your data to examine more details with code similar to:&lt;/P&gt;
&lt;PRE&gt;data detail;
   set yourdatasetname;
   where City in ('Chi' 'Chi.' 'Chgago' 'chi' );
run;&lt;/PRE&gt;
&lt;P&gt;Pick your variable and the IN operator finds &lt;STRONG&gt;exact&lt;/STRONG&gt; matches (from your Proc Freq output) and will have the entire record so you would have all the information in the set associated with the suspect values.&lt;/P&gt;
&lt;P&gt;What you do after that depends.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note: SAS datasets have Variables not columns.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Feb 2022 15:44:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Finding-incomplete-partial-data-in-a-dataset/m-p/795691#M32947</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2022-02-11T15:44:06Z</dc:date>
    </item>
  </channel>
</rss>

