<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Data Cleaning - Character String in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Data-Cleaning-Character-String/m-p/379257#M91276</link>
    <description>&lt;P&gt;Hi all,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have issues on my data.&lt;/P&gt;&lt;P&gt;What I am trying to do is data cleaning and the data I received is a raw file.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The example for the data I am working is as below:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA TEMP;
INPUT @1 Ref 7. 
	  @9 ID 5. 
	  @14 LockerNo 12.
	  @28 Date ddmmyy10.
	  @39 Cust_Name $15.
;
format date ddmmyy10.;
DATALINES;
4511445 1952  123456789123 08/12/2001 XYZ PRODUCTION 
4545154 1952  987654321987 26/09/2001 DEF Co.        
7895412 96321 456123789456 12/10/2000 PINK Enterprise
7895412 96321 123789654753 21/12/2000 GbC AGENCY     
5451654 96321 125874934589 05/10/2006 ABC AGENCY     
5451654 96321 123467984352 23/11/2004 ABC AGENCu     
5451654 96321 785645464644 17/01/2005 ABC AGENCY     
7895412 96321 123789654753 21/12/2000 ABC AGENCY     
;
RUN; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;What I am trying to achieve is:&lt;/P&gt;&lt;P&gt;# Note that Everything will be grouped by ID.&amp;nbsp;&lt;/P&gt;&lt;P&gt;1) If the&lt;EM&gt;&lt;STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;first&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;5 characters&lt;/STRONG&gt;&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of Cust_Name&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;or&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;last 5 characters&lt;/STRONG&gt;&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of Cust_Name is the same, then a new variable, Final will have the value of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;U&gt;&lt;EM&gt;Ref with latest Date&lt;/EM&gt;&lt;/U&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is the table I am trying to achieve:&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Ref&lt;/TD&gt;&lt;TD&gt;ID&lt;/TD&gt;&lt;TD&gt;Date&lt;/TD&gt;&lt;TD&gt;Cust_Name&lt;/TD&gt;&lt;TD&gt;Final&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;4511445&lt;/TD&gt;&lt;TD&gt;1952&lt;/TD&gt;&lt;TD&gt;8/12/2001&lt;/TD&gt;&lt;TD&gt;XYZ PRODUCTION&lt;/TD&gt;&lt;TD&gt;4511445&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;4545154&lt;/TD&gt;&lt;TD&gt;1952&lt;/TD&gt;&lt;TD&gt;26/9/2001&lt;/TD&gt;&lt;TD&gt;DEF Co.&lt;/TD&gt;&lt;TD&gt;4545154&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;7895412&lt;/TD&gt;&lt;TD&gt;96321&lt;/TD&gt;&lt;TD&gt;12/10/2000&lt;/TD&gt;&lt;TD&gt;PINK Enterprise&lt;/TD&gt;&lt;TD&gt;7895412&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;7895412&lt;/TD&gt;&lt;TD&gt;96321&lt;/TD&gt;&lt;TD&gt;21/12/2000&lt;/TD&gt;&lt;TD&gt;GbC AGENCY&lt;/TD&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;TD&gt;96321&lt;/TD&gt;&lt;TD&gt;5/10/2006&lt;/TD&gt;&lt;TD&gt;ABC AGENCY&lt;/TD&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;TD&gt;96321&lt;/TD&gt;&lt;TD&gt;23/11/2004&lt;/TD&gt;&lt;TD&gt;ABC AGENCu&lt;/TD&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;TD&gt;96321&lt;/TD&gt;&lt;TD&gt;17/1/2005&lt;/TD&gt;&lt;TD&gt;ABC AGENCY&lt;/TD&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;TD&gt;96321&lt;/TD&gt;&lt;TD&gt;21/12/2000&lt;/TD&gt;&lt;TD&gt;ABC AGENCY&lt;/TD&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cust_Name is a character string and might have many patterns, so after looking at the data, some might have either for the first 5 characters or the last 5 characters.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I REALLY APPRECIATE YOUR HELP ON THIS!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Btw, I am using SAS Enterprise Guide 7.1&lt;/P&gt;</description>
    <pubDate>Wed, 26 Jul 2017 01:52:10 GMT</pubDate>
    <dc:creator>AieuYuhara</dc:creator>
    <dc:date>2017-07-26T01:52:10Z</dc:date>
    <item>
      <title>Data Cleaning - Character String</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-Cleaning-Character-String/m-p/379257#M91276</link>
      <description>&lt;P&gt;Hi all,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have issues on my data.&lt;/P&gt;&lt;P&gt;What I am trying to do is data cleaning and the data I received is a raw file.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The example for the data I am working is as below:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;DATA TEMP;
INPUT @1 Ref 7. 
	  @9 ID 5. 
	  @14 LockerNo 12.
	  @28 Date ddmmyy10.
	  @39 Cust_Name $15.
;
format date ddmmyy10.;
DATALINES;
4511445 1952  123456789123 08/12/2001 XYZ PRODUCTION 
4545154 1952  987654321987 26/09/2001 DEF Co.        
7895412 96321 456123789456 12/10/2000 PINK Enterprise
7895412 96321 123789654753 21/12/2000 GbC AGENCY     
5451654 96321 125874934589 05/10/2006 ABC AGENCY     
5451654 96321 123467984352 23/11/2004 ABC AGENCu     
5451654 96321 785645464644 17/01/2005 ABC AGENCY     
7895412 96321 123789654753 21/12/2000 ABC AGENCY     
;
RUN; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;What I am trying to achieve is:&lt;/P&gt;&lt;P&gt;# Note that Everything will be grouped by ID.&amp;nbsp;&lt;/P&gt;&lt;P&gt;1) If the&lt;EM&gt;&lt;STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;first&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;5 characters&lt;/STRONG&gt;&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of Cust_Name&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;or&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;last 5 characters&lt;/STRONG&gt;&lt;/EM&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of Cust_Name is the same, then a new variable, Final will have the value of&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;U&gt;&lt;EM&gt;Ref with latest Date&lt;/EM&gt;&lt;/U&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Below is the table I am trying to achieve:&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Ref&lt;/TD&gt;&lt;TD&gt;ID&lt;/TD&gt;&lt;TD&gt;Date&lt;/TD&gt;&lt;TD&gt;Cust_Name&lt;/TD&gt;&lt;TD&gt;Final&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;4511445&lt;/TD&gt;&lt;TD&gt;1952&lt;/TD&gt;&lt;TD&gt;8/12/2001&lt;/TD&gt;&lt;TD&gt;XYZ PRODUCTION&lt;/TD&gt;&lt;TD&gt;4511445&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;4545154&lt;/TD&gt;&lt;TD&gt;1952&lt;/TD&gt;&lt;TD&gt;26/9/2001&lt;/TD&gt;&lt;TD&gt;DEF Co.&lt;/TD&gt;&lt;TD&gt;4545154&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;7895412&lt;/TD&gt;&lt;TD&gt;96321&lt;/TD&gt;&lt;TD&gt;12/10/2000&lt;/TD&gt;&lt;TD&gt;PINK Enterprise&lt;/TD&gt;&lt;TD&gt;7895412&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;7895412&lt;/TD&gt;&lt;TD&gt;96321&lt;/TD&gt;&lt;TD&gt;21/12/2000&lt;/TD&gt;&lt;TD&gt;GbC AGENCY&lt;/TD&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;TD&gt;96321&lt;/TD&gt;&lt;TD&gt;5/10/2006&lt;/TD&gt;&lt;TD&gt;ABC AGENCY&lt;/TD&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;TD&gt;96321&lt;/TD&gt;&lt;TD&gt;23/11/2004&lt;/TD&gt;&lt;TD&gt;ABC AGENCu&lt;/TD&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;TD&gt;96321&lt;/TD&gt;&lt;TD&gt;17/1/2005&lt;/TD&gt;&lt;TD&gt;ABC AGENCY&lt;/TD&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;TD&gt;96321&lt;/TD&gt;&lt;TD&gt;21/12/2000&lt;/TD&gt;&lt;TD&gt;ABC AGENCY&lt;/TD&gt;&lt;TD&gt;5451654&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cust_Name is a character string and might have many patterns, so after looking at the data, some might have either for the first 5 characters or the last 5 characters.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I REALLY APPRECIATE YOUR HELP ON THIS!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Btw, I am using SAS Enterprise Guide 7.1&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2017 01:52:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-Cleaning-Character-String/m-p/379257#M91276</guid>
      <dc:creator>AieuYuhara</dc:creator>
      <dc:date>2017-07-26T01:52:10Z</dc:date>
    </item>
    <item>
      <title>Re: Data Cleaning - Character String</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-Cleaning-Character-String/m-p/379263#M91277</link>
      <description>&lt;P&gt;What is Final for this data?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;ref ID Cust_Name Final
1 1 12345ABCDE ?
2 1 12345ABCDF ?
3 1 12346ABCDF ?
&lt;/PRE&gt;
&lt;P&gt;Cust_name from obs 1 and 2 share the first 5 chars and obs 2 and 3 share the last five chars.&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2017 02:34:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-Cleaning-Character-String/m-p/379263#M91277</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2017-07-26T02:34:41Z</dc:date>
    </item>
    <item>
      <title>Re: Data Cleaning - Character String</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Data-Cleaning-Character-String/m-p/379266#M91279</link>
      <description>&lt;P&gt;Hi PG,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Final will have the value which parent it should refer to. (The parent should refer to the Ref which have the latest date)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Cust_name (last column) from observation 1 and 2 does not share the first five characters. as one is XYZ Production and PINK DEF Co. Since they do not share or have same 5 character for the first or last, it have their own ref as parent.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is the code to get the variable Final. But the problem is they read cust_name seperately even though they just differs 2/3 characters.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=temp;
by id cust_name descending date;
run;

data want;
set temp;
by id cust_name;
retain final;
if first.cust_name then final = ref;
run;

proc sort data=want;
by id cust_name date;
run;

proc print data=want noobs;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;after&amp;nbsp;you run the above code, what i want is observation 7 to copy&amp;nbsp;5451654&amp;nbsp;for Final.&amp;nbsp;&lt;BR /&gt;why? because they are in the same ID, only some of the&amp;nbsp;characters differs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now I am looking for some codes which might help to group the data even though some of the customer_name differs one/two characters.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Appreciate your help!&lt;/P&gt;</description>
      <pubDate>Wed, 26 Jul 2017 03:19:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Data-Cleaning-Character-String/m-p/379266#M91279</guid>
      <dc:creator>AieuYuhara</dc:creator>
      <dc:date>2017-07-26T03:19:49Z</dc:date>
    </item>
  </channel>
</rss>

