<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Data cleaning in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Data-cleaning/m-p/44796#M9178</link>
    <description>Consider using a DATA step and the INDEXW function (and many others like it) against a list of data-strings representing your candidate core sub-string values.&lt;BR /&gt;
&lt;BR /&gt;
For some technical papers/references on this topic, recommend searching the SAS support  &lt;A href="http://support.sas.com/" target="_blank"&gt;http://support.sas.com/&lt;/A&gt;  website using the keywords "cleaning" and also "cleansing", or here is a suggested Google advanced search argument which yields some interesting results:&lt;BR /&gt;
&lt;BR /&gt;
data cleaning cleansing site:sas.com&lt;BR /&gt;
&lt;BR /&gt;
Scott Barry&lt;BR /&gt;
SBBWorks, Inc.</description>
    <pubDate>Mon, 15 Jun 2009 15:09:14 GMT</pubDate>
    <dc:creator>sbb</dc:creator>
    <dc:date>2009-06-15T15:09:14Z</dc:date>
    <item>
      <title>Data cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-cleaning/m-p/44795#M9177</link>
      <description>(ACCESS) Vagisil&lt;BR /&gt;
Free Vagisil&lt;BR /&gt;
*** Vagisil ***&lt;BR /&gt;
HRC Vagisil&lt;BR /&gt;
&lt;BR /&gt;
The above shown is the raw data for a drugname and (other variations are possible) .The task to create a new variable clean_drug. If the raw data contains vagisil, then clean_drug is "Vagisil".&lt;BR /&gt;
&lt;BR /&gt;
This is the case with other drug names too.Any help is appreciated.</description>
      <pubDate>Mon, 15 Jun 2009 14:55:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-cleaning/m-p/44795#M9177</guid>
      <dc:creator>SASPhile</dc:creator>
      <dc:date>2009-06-15T14:55:39Z</dc:date>
    </item>
    <item>
      <title>Re: Data cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-cleaning/m-p/44796#M9178</link>
      <description>Consider using a DATA step and the INDEXW function (and many others like it) against a list of data-strings representing your candidate core sub-string values.&lt;BR /&gt;
&lt;BR /&gt;
For some technical papers/references on this topic, recommend searching the SAS support  &lt;A href="http://support.sas.com/" target="_blank"&gt;http://support.sas.com/&lt;/A&gt;  website using the keywords "cleaning" and also "cleansing", or here is a suggested Google advanced search argument which yields some interesting results:&lt;BR /&gt;
&lt;BR /&gt;
data cleaning cleansing site:sas.com&lt;BR /&gt;
&lt;BR /&gt;
Scott Barry&lt;BR /&gt;
SBBWorks, Inc.</description>
      <pubDate>Mon, 15 Jun 2009 15:09:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-cleaning/m-p/44796#M9178</guid>
      <dc:creator>sbb</dc:creator>
      <dc:date>2009-06-15T15:09:14Z</dc:date>
    </item>
    <item>
      <title>Re: Data cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-cleaning/m-p/44797#M9179</link>
      <description>If this is a common problem, and it is important to have the data properly cleansed, I suggest that you look into DataFlux PowerSudio and possible SAS Data Quality, where you have tools designed to solve these kind problems.&lt;BR /&gt;
&lt;BR /&gt;
/LIinus</description>
      <pubDate>Tue, 16 Jun 2009 05:50:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-cleaning/m-p/44797#M9179</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2009-06-16T05:50:41Z</dc:date>
    </item>
    <item>
      <title>Re: Data cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-cleaning/m-p/44798#M9180</link>
      <description>Just a quick example, case is not taken care of.....&lt;BR /&gt;
&lt;BR /&gt;
First macro removes known trash and the second keeps known drugs...&lt;BR /&gt;
&lt;BR /&gt;
data test;&lt;BR /&gt;
 length a $30;&lt;BR /&gt;
 input a 30.;&lt;BR /&gt;
 datalines;&lt;BR /&gt;
(ACCESS) Vagisil&lt;BR /&gt;
Free Vagisil&lt;BR /&gt;
*** Vagisil ***&lt;BR /&gt;
HRC Vagisil&lt;BR /&gt;
Get Bobs&lt;BR /&gt;
;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
%macro clean;&lt;BR /&gt;
%let clean_list = (ACCESS)|HRC|Free|***;&lt;BR /&gt;
data test1;&lt;BR /&gt;
 set test;&lt;BR /&gt;
 length clean_drug $200;&lt;BR /&gt;
 clean_drug = a;&lt;BR /&gt;
 %let i = 1;&lt;BR /&gt;
 %do %while (%bquote(%scan(&amp;amp;clean_list,&amp;amp;i,|)) ne %str());&lt;BR /&gt;
   clean_drug = strip(tranwrd(clean_drug,"%scan(%quote(&amp;amp;clean_list),&amp;amp;i,|)",""));&lt;BR /&gt;
   %let i = %eval(&amp;amp;i + 1);&lt;BR /&gt;
 %end;&lt;BR /&gt;
run;&lt;BR /&gt;
%mend;&lt;BR /&gt;
%clean;&lt;BR /&gt;
&lt;BR /&gt;
%macro keep;&lt;BR /&gt;
%let keep_list = Vagisil|Bobs;&lt;BR /&gt;
data test2;&lt;BR /&gt;
 set test;&lt;BR /&gt;
 length clean_drug $200;&lt;BR /&gt;
 %let i = 1;&lt;BR /&gt;
 %do %while (%bquote(%scan(&amp;amp;keep_list,&amp;amp;i,|)) ne %str());&lt;BR /&gt;
   if indexw(a,"%scan(%quote(&amp;amp;keep_list),&amp;amp;i,|)") then clean_drug = "%scan(%quote(&amp;amp;keep_list),&amp;amp;i,|)";&lt;BR /&gt;
   %let i = %eval(&amp;amp;i + 1);&lt;BR /&gt;
 %end;&lt;BR /&gt;
run;&lt;BR /&gt;
%mend;&lt;BR /&gt;
%keep;</description>
      <pubDate>Tue, 16 Jun 2009 15:31:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-cleaning/m-p/44798#M9180</guid>
      <dc:creator>FredrikE</dc:creator>
      <dc:date>2009-06-16T15:31:12Z</dc:date>
    </item>
    <item>
      <title>Re: Data cleaning</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-cleaning/m-p/44799#M9181</link>
      <description>That's some code to help soothe your pain....&lt;BR /&gt;
/Sorry, couldn't resist.</description>
      <pubDate>Wed, 17 Jun 2009 14:02:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-cleaning/m-p/44799#M9181</guid>
      <dc:creator>CameronLawson</dc:creator>
      <dc:date>2009-06-17T14:02:43Z</dc:date>
    </item>
  </channel>
</rss>

