<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Data Cleaning in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706869#M217040</link>
    <description>&lt;P&gt;&lt;EM&gt;&amp;gt;&amp;nbsp;&amp;nbsp;I used 70-99 = "70 and above" and I think that read fine.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;0-18 could also be an interval&lt;/P&gt;
&lt;P&gt;then .&lt;/P&gt;
&lt;P&gt;then the rest:&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;0-18='0-18'&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;.='Missing'&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;other='Invalid'&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;This way you catch -999 or +999&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that using a formatted number rather than a string means that the format must be available when opening the data set. So it's not always a better solution.&lt;/P&gt;</description>
    <pubDate>Fri, 18 Dec 2020 03:31:16 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2020-12-18T03:31:16Z</dc:date>
    <item>
      <title>Proc Format Question - Value statement</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706548#M216856</link>
      <description>&lt;P&gt;Hello all,&lt;/P&gt;&lt;P&gt;I have a question regarding Proc Format and the value statement.&lt;/P&gt;&lt;P&gt;I have an external dataset that has age grouped into age ranges. The dataset needs to be cleaned and organized. For values such as 20-29ax or 30s-39 I used the compress function to squeeze out the letters and used a Proc Format to change those to "20-29" and "30-39"&lt;/P&gt;&lt;P&gt;However, what do I do about the excel columns that just say "Under 18" how would I clean that to make it presentable in SAS. I tried to do 0-17 but it didn't work because on the excel dataset it just says "Under 18"&lt;/P&gt;</description>
      <pubDate>Thu, 17 Dec 2020 07:17:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706548#M216856</guid>
      <dc:creator>shortyofhb</dc:creator>
      <dc:date>2020-12-17T07:17:20Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Format Question - Value statement</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706555#M216862</link>
      <description>&lt;P&gt;Create an informat that catches all the "funny" values entered in the spreadsheet, and alings them to your wanted values.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc format;
invalue $funny_excel
  "Under 18" = "0-17"
  "20-29" = "20-29"
  "20-29ax" = "20-29"
  "30-39" = "30-39"
  "30s-39" = "30-39"
  other = "ERROR"
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Add all other allowed or expected values; after each import, run&amp;nbsp;a quick check for "ERROR" values to find values you had not encountered yet.&lt;/P&gt;</description>
      <pubDate>Thu, 17 Dec 2020 07:47:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706555#M216862</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-12-17T07:47:14Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Format Question - Value statement</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706658#M216895</link>
      <description>&lt;P&gt;It never hurts to show the code you have attempted.&lt;/P&gt;
&lt;P&gt;Such as show us&amp;nbsp; the code of your format.&lt;/P&gt;
&lt;P&gt;And some actual values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 17 Dec 2020 15:19:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706658#M216895</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-12-17T15:19:36Z</dc:date>
    </item>
    <item>
      <title>Re: Proc Format Question - Value statement</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706860#M217028</link>
      <description>&lt;P&gt;Proc Format Library=A;&lt;BR /&gt;Value $Ageft&lt;/P&gt;&lt;P&gt;other = "0-17"&lt;BR /&gt;18-29 = "18-29"&lt;BR /&gt;30-39 = "30-39"&lt;BR /&gt;40-49 = "40-49"&lt;BR /&gt;50-59 = "50-59"&lt;BR /&gt;60-69 = "60-69"&lt;BR /&gt;70-99 = "70+";&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The issue I am having is the age variable in my dataset is very messy.&lt;/P&gt;&lt;P&gt;It has age groups in intervals of 10, but some of the values have "20-29gt" or "30s-49" or "Under 18" or "70 and above"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How would I go about organizing this data?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I used a compress function to get rid of any extra letters of symbols ($,&amp;amp;, etc.) out of the age intervals. But I do not know about to go about now formatting it so "Under 18" or "70 and above" will properly be coded. In addition, I need to take into account missing cells.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any suggestions would be great!&lt;/P&gt;</description>
      <pubDate>Fri, 18 Dec 2020 02:19:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706860#M217028</guid>
      <dc:creator>shortyofhb</dc:creator>
      <dc:date>2020-12-18T02:19:27Z</dc:date>
    </item>
    <item>
      <title>Data Cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706861#M217039</link>
      <description>&lt;P&gt;Hello all,&lt;/P&gt;&lt;P&gt;I need some help in cleaning up an external dataset if possible, I would greatly appreciate it!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I imported an external excel dataset into SAS. The variable age is in intervals of age groups (20-29, 30-39, etc.) I am supposed to format the age into age groups and clean the dataset to make it look nice.&lt;/P&gt;&lt;P&gt;The issue I am running into is that the dataset has a bunch of messy values like (20ag-29, 30-39exa, Under 18, 70 and above, and missing values).&lt;/P&gt;&lt;P&gt;First I used a compress function to rid of any letters or symbols in the dataset, but then I got stuck.&lt;/P&gt;&lt;P&gt;I normally use a Proc Format to create my age variables and so far have:&lt;/P&gt;&lt;P&gt;Proc Format Library=WorkData;&lt;/P&gt;&lt;P&gt;Value&amp;nbsp; &amp;nbsp;Ageft.&amp;nbsp; &amp;nbsp;18-29&amp;nbsp; &amp;nbsp;=&amp;nbsp; &amp;nbsp;"18-29"&lt;/P&gt;&lt;P&gt;etc. all the way to 60-69.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, how do I take into account the values that say "Under 18" "70 and above" as well as the blank cells?&lt;/P&gt;&lt;P&gt;Would I need to use an array for the blank cells and code them as missing?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you in advanced!&lt;/P&gt;&lt;P&gt;(Note: When I run a Proc Freq, it takes into account everything except the "Under 18" values. I used 70-99 = "70 and above" and I think that read fine.")&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 18 Dec 2020 02:37:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706861#M217039</guid>
      <dc:creator>shortyofhb</dc:creator>
      <dc:date>2020-12-18T02:37:18Z</dc:date>
    </item>
    <item>
      <title>Re: Data Cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706869#M217040</link>
      <description>&lt;P&gt;&lt;EM&gt;&amp;gt;&amp;nbsp;&amp;nbsp;I used 70-99 = "70 and above" and I think that read fine.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;0-18 could also be an interval&lt;/P&gt;
&lt;P&gt;then .&lt;/P&gt;
&lt;P&gt;then the rest:&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;0-18='0-18'&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;.='Missing'&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;other='Invalid'&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;This way you catch -999 or +999&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that using a formatted number rather than a string means that the format must be available when opening the data set. So it's not always a better solution.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Dec 2020 03:31:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706869#M217040</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-12-18T03:31:16Z</dc:date>
    </item>
    <item>
      <title>Re: Data Cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706871#M217041</link>
      <description>&lt;P&gt;I looked at the data.&lt;/P&gt;&lt;P&gt;There is a 18-29 interval and a 20-29 interval in the dataset. So I created an 18-29 Value in the proc format.&lt;/P&gt;&lt;P&gt;But 0-18 did not read the "Under 18" value in the datasheet.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The excel data sheet says "Under 18" so I am not sure how to get this to read.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Dec 2020 03:56:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706871#M217041</guid>
      <dc:creator>shortyofhb</dc:creator>
      <dc:date>2020-12-18T03:56:56Z</dc:date>
    </item>
    <item>
      <title>Re: Data Cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706878#M217042</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/360880"&gt;@shortyofhb&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I looked at the data.&lt;/P&gt;
&lt;P&gt;There is a 18-29 interval and a 20-29 interval in the dataset. So I created an 18-29 Value in the proc format.&lt;/P&gt;
&lt;P&gt;But 0-18 did not read the "Under 18" value in the datasheet.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The excel data sheet says "Under 18" so I am not sure how to get this to read.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Please post the data you have in usable form: a data step using datalines-statment. If you have "Under 18" as value in your data, then using a numeric format won't give you the expected results.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;EDIT: This looks like &lt;A href="https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706548," target="_blank"&gt;https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706548,&lt;/A&gt; i will merge both posts.&lt;/P&gt;</description>
      <pubDate>Fri, 18 Dec 2020 05:23:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-Format-Question-Value-statement/m-p/706878#M217042</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2020-12-18T05:23:06Z</dc:date>
    </item>
  </channel>
</rss>

