<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: inporting character data in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7900#M244</link>
    <description>Hi:&lt;BR /&gt;
  I was going to recommend the PRX (Perl Regular Expressions) method, too -- but Tim beat me to it. Perl regular expressions are cool and were introduced in SAS 9. For more info, see here:&lt;BR /&gt;
&lt;A href="http://www2.sas.com/proceedings/sugi29/265-29.pdf" target="_blank"&gt;http://www2.sas.com/proceedings/sugi29/265-29.pdf&lt;/A&gt;&lt;BR /&gt;
&lt;A href="http://support.sas.com/rnd/base/datastep/perl_regexp/index.html" target="_blank"&gt;http://support.sas.com/rnd/base/datastep/perl_regexp/index.html&lt;/A&gt;&lt;BR /&gt;
&lt;A href="http://analytics.ncsu.edu/sesug/2006/AP09_06.PDF" target="_blank"&gt;http://analytics.ncsu.edu/sesug/2006/AP09_06.PDF&lt;/A&gt;&lt;BR /&gt;
&lt;A href="http://www.pharmasug.org/2005/TU02.pdf" target="_blank"&gt;http://www.pharmasug.org/2005/TU02.pdf&lt;/A&gt;&lt;BR /&gt;
&lt;BR /&gt;
I think that otherwise, something with scan, substr, findc, anyalpha and/or notalpha functions could help you divide the address into the leading zeroes and the rest of the field -- but not as elegantly as PRX functions.&lt;BR /&gt;
 &lt;BR /&gt;
cynthia</description>
    <pubDate>Mon, 31 Mar 2008 22:52:55 GMT</pubDate>
    <dc:creator>Cynthia_sas</dc:creator>
    <dc:date>2008-03-31T22:52:55Z</dc:date>
    <item>
      <title>inporting character data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7889#M233</link>
      <description>Greetings:&lt;BR /&gt;
&lt;BR /&gt;
I am trying to import character with a field that has zeros on the left hand side instead of nulls.  For example, the a value may have 000186TH instead of 186TH.  What is the best way of handling this ?  Should the entire field be parsed one character at a time, is there a string function that can access the left most fields or is there a flag that can be set which turns leading zeros to nulls ?&lt;BR /&gt;
&lt;BR /&gt;
Thanks.&lt;BR /&gt;
&lt;BR /&gt;
alan</description>
      <pubDate>Thu, 27 Mar 2008 23:26:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7889#M233</guid>
      <dc:creator>alanj</dc:creator>
      <dc:date>2008-03-27T23:26:35Z</dc:date>
    </item>
    <item>
      <title>Re: inporting character data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7890#M234</link>
      <description>Well a couple of options but it depends on the data.&lt;BR /&gt;
&lt;BR /&gt;
Are you looking for the same number of left hand characters? &lt;BR /&gt;
Or is it always a number followed by two characters?&lt;BR /&gt;
Or, worst case, is it just any set of alphanumerics with leading zeros?&lt;BR /&gt;
&lt;BR /&gt;
Case 1. There is a function for that&lt;BR /&gt;
Case 2. There is a trick you can use&lt;BR /&gt;
Case 3. I think you are stuck doing a character by character parsing.&lt;BR /&gt;
&lt;BR /&gt;
Let me know more details and I'll try to help you out.&lt;BR /&gt;
&lt;BR /&gt;
Ike Eisenhauer</description>
      <pubDate>Sat, 29 Mar 2008 03:29:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7890#M234</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2008-03-29T03:29:42Z</dc:date>
    </item>
    <item>
      <title>Re: inporting character data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7891#M235</link>
      <description>Good morning:&lt;BR /&gt;
&lt;BR /&gt;
Thanks, Ike.&lt;BR /&gt;
&lt;BR /&gt;
The pattern is&lt;BR /&gt;
zerosValidcharacrters&lt;BR /&gt;
&lt;BR /&gt;
where zeros is anywhere from 1 to 5 zeros in a row.&lt;BR /&gt;
Validcharacters is a street address and can be 186TH or Elm.&lt;BR /&gt;
&lt;BR /&gt;
The following are possible in the file:&lt;BR /&gt;
0186TH&lt;BR /&gt;
00186TH&lt;BR /&gt;
000186TH&lt;BR /&gt;
0000186Th&lt;BR /&gt;
00000186TH&lt;BR /&gt;
&lt;BR /&gt;
Currently, I am using proc sql to process the data.&lt;BR /&gt;
&lt;BR /&gt;
alan</description>
      <pubDate>Mon, 31 Mar 2008 17:10:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7891#M235</guid>
      <dc:creator>alanj</dc:creator>
      <dc:date>2008-03-31T17:10:43Z</dc:date>
    </item>
    <item>
      <title>Re: inporting character data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7892#M236</link>
      <description>Hi:&lt;BR /&gt;
  The technique would vary. A simple translate function could get rid of zeroes, but would mess up on 00020TH or 000150TH depending on whether you'd ever have this:&lt;BR /&gt;
zeroes part of ValidCharacters:&lt;BR /&gt;
00020TH&lt;BR /&gt;
00000150TH&lt;BR /&gt;
&lt;BR /&gt;
(which it sounds like you could have the above).&lt;BR /&gt;
&lt;BR /&gt;
But would you ever have this:&lt;BR /&gt;
&lt;BR /&gt;
Elm00000  (zeroes after)&lt;BR /&gt;
00000Apt10 (not ST,  ND, RD, TH in the string)&lt;BR /&gt;
&lt;BR /&gt;
cynthia</description>
      <pubDate>Mon, 31 Mar 2008 18:06:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7892#M236</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2008-03-31T18:06:23Z</dc:date>
    </item>
    <item>
      <title>Re: inporting character data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7893#M237</link>
      <description>Good question, Cynthia.  The apartment number is in a separate field and also has leading zeros.&lt;BR /&gt;
&lt;BR /&gt;
You are absolutely right, that an address such as 1050 Elm would get lost if a translate were used.  It's one of the reasons that I've found this problem to be such a sticky wicket.&lt;BR /&gt;
&lt;BR /&gt;
Thanks for you reply.</description>
      <pubDate>Mon, 31 Mar 2008 18:17:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7893#M237</guid>
      <dc:creator>alanj</dc:creator>
      <dc:date>2008-03-31T18:17:42Z</dc:date>
    </item>
    <item>
      <title>Re: inporting character data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7894#M238</link>
      <description>Ah, now we're getting closer...you've been showing just the "street" 000186TH or 00000Elm not the number and street. So I didn't consider that you could have this:&lt;BR /&gt;
&lt;BR /&gt;
0001050 Elm&lt;BR /&gt;
&lt;BR /&gt;
Is the Street NUMBER part of this field, too????  What's the max length of the field? Are there internal "spaces" so it is:&lt;BR /&gt;
00001050 Elm??? or 000123 NE 1st St??? Is the field justified to the left, with leading zeroes, as you show -- what's trailing for short addresses??? spaces?&lt;BR /&gt;
&lt;BR /&gt;
could you also have:&lt;BR /&gt;
[pre]&lt;BR /&gt;
123 Elm&lt;BR /&gt;
0123 Elm&lt;BR /&gt;
00123 Elm&lt;BR /&gt;
000123 Elm&lt;BR /&gt;
0000123 Elm&lt;BR /&gt;
00000123 Elm&lt;BR /&gt;
1050 Elm&lt;BR /&gt;
001050 Elm&lt;BR /&gt;
01050 Elm&lt;BR /&gt;
00001050 Elm&lt;BR /&gt;
1111 Twisted Turny Long Avenue, NE&lt;BR /&gt;
[/pre]&lt;BR /&gt;
&lt;BR /&gt;
That is a complication. In which case, looping through the field character by character might be your only choice. And, that means, you have to know when to set the bounds for stopping the loop -- because the loop will have the same issue with 0000150TH and 0001050 Elm if you don't stop it at the right place. And aren't these also possible valid street names:&lt;BR /&gt;
&lt;BR /&gt;
0000NE 1st ST (so you'd stop at the alpha character, not at the 1)&lt;BR /&gt;
000SE 10th Ave &lt;BR /&gt;
000RR #3 &lt;BR /&gt;
0000SR 23&lt;BR /&gt;
0000PO Box 61&lt;BR /&gt;
 &lt;BR /&gt;
I sort of expected that your leading zero data would look like this, so the number of zeroes in front of the address wouldn't vary the way you showed and that 123 Elm might have these variations:&lt;BR /&gt;
[pre]&lt;BR /&gt;
000000000000000000000000000123 Elm&lt;BR /&gt;
000000000000000000000000123 Elm St&lt;BR /&gt;
00000000000000000000000123 Elm St.&lt;BR /&gt;
00000000000000000000123 Elm Street&lt;BR /&gt;
000000000000000000000000001050 Elm&lt;BR /&gt;
0000001234 NE 1st Street, Extended&lt;BR /&gt;
1111 Twisted Turny Long Avenue, NE&lt;BR /&gt;
[/pre]&lt;BR /&gt;
&lt;BR /&gt;
I don't suppose you can go back to the folks who supply the data and have them change how they initialize the field -- with blanks instead of 0s?? &lt;BR /&gt;
&lt;BR /&gt;
cynthia</description>
      <pubDate>Mon, 31 Mar 2008 18:52:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7894#M238</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2008-03-31T18:52:30Z</dc:date>
    </item>
    <item>
      <title>Re: inporting character data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7895#M239</link>
      <description>It's Regular Expressions to the rescue!&lt;BR /&gt;
&lt;BR /&gt;
[pre]&lt;BR /&gt;
data _null_;&lt;BR /&gt;
   infile datalines;&lt;BR /&gt;
   retain r;&lt;BR /&gt;
   if _n_ = 1 then&lt;BR /&gt;
      r = prxparse("/^0*(.*)$/");&lt;BR /&gt;
   input address $80.;&lt;BR /&gt;
   if prxmatch(r, address) then do;&lt;BR /&gt;
      call prxposn(r, 1, start, length);&lt;BR /&gt;
      address = substr(address, start, length);&lt;BR /&gt;
   end;&lt;BR /&gt;
   put address;&lt;BR /&gt;
datalines;&lt;BR /&gt;
000000000000000000000000000123 Elm&lt;BR /&gt;
000000000000000000000000123 Elm St&lt;BR /&gt;
00000000000000000000000123 Elm St.&lt;BR /&gt;
00000000000000000000123 Elm Street&lt;BR /&gt;
000000000000000000000000001050 Elm&lt;BR /&gt;
0000001234 NE 1st Street, Extended&lt;BR /&gt;
1111 Twisted Turny Long Avenue, NE&lt;BR /&gt;
;;;;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
produces&lt;BR /&gt;
[pre]&lt;BR /&gt;
42   data _null_;&lt;BR /&gt;
43   infile datalines;&lt;BR /&gt;
44   retain r;&lt;BR /&gt;
45   if _n_ = 1 then&lt;BR /&gt;
46       r = prxparse("/^0*(.*)$/");&lt;BR /&gt;
47   input address $80.;&lt;BR /&gt;
48   if prxmatch(r, address) then do;&lt;BR /&gt;
49      call prxposn(r, 1, start, length);&lt;BR /&gt;
50      address = substr(address, start, length);&lt;BR /&gt;
51   end;&lt;BR /&gt;
52   put address;&lt;BR /&gt;
53   datalines;&lt;BR /&gt;
&lt;BR /&gt;
123 Elm&lt;BR /&gt;
123 Elm St&lt;BR /&gt;
123 Elm St.&lt;BR /&gt;
123 Elm Street&lt;BR /&gt;
1050 Elm&lt;BR /&gt;
1234 NE 1st Street, Extended&lt;BR /&gt;
1111 Twisted Turny Long Avenue, NE&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           0.01 seconds&lt;BR /&gt;
      cpu time            0.01 seconds&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
61   ;;;;&lt;BR /&gt;
[/pre]</description>
      <pubDate>Mon, 31 Mar 2008 19:30:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7895#M239</guid>
      <dc:creator>Tim_SAS</dc:creator>
      <dc:date>2008-03-31T19:30:33Z</dc:date>
    </item>
    <item>
      <title>Re: inporting character data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7896#M240</link>
      <description>Cynthia:&lt;BR /&gt;
&lt;BR /&gt;
The text file includes a separate field for street suffix.  Here is an example of the street number, suffix and name.  The fields are fixed width.  Leading zeros fill the field..&lt;BR /&gt;
&lt;BR /&gt;
street number        street suffix     streetname&lt;BR /&gt;
0000029221          W                   000186TH&lt;BR /&gt;
&lt;BR /&gt;
I am going to check with the vendor tomorrow about the format.  Currently, the script hacks is way through each field with something like this.&lt;BR /&gt;
set propstreetname = substr(propstreetname,3,length(propstreetname))&lt;BR /&gt;
where find(propstreetname,'00') = 1;&lt;BR /&gt;
update sasuser.propinfo&lt;BR /&gt;
&lt;BR /&gt;
Getting data with leading zeros would make data loading much faster and easier.&lt;BR /&gt;
&lt;BR /&gt;
Again, thanks for the reply.&lt;BR /&gt;
&lt;BR /&gt;
alan</description>
      <pubDate>Mon, 31 Mar 2008 19:47:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7896#M240</guid>
      <dc:creator>alanj</dc:creator>
      <dc:date>2008-03-31T19:47:55Z</dc:date>
    </item>
    <item>
      <title>Re: inporting character data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7897#M241</link>
      <description>Thanks, Tim.&lt;BR /&gt;
&lt;BR /&gt;
Very interesting solution. I must admit that I hadn't heard of these functions before.&lt;BR /&gt;
&lt;BR /&gt;
alan</description>
      <pubDate>Mon, 31 Mar 2008 19:53:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7897#M241</guid>
      <dc:creator>alanj</dc:creator>
      <dc:date>2008-03-31T19:53:17Z</dc:date>
    </item>
    <item>
      <title>Re: inporting character data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7898#M242</link>
      <description>some people recommend the address cleaning services. ~ Perhaps that is, or should be, part of the Data Integration/Data Flux, or the "Q in ETL to the power of Q".&lt;BR /&gt;
&lt;BR /&gt;
Although this is more of a language thing, it might be nice to have a proc that would help with address cleaning!&lt;BR /&gt;
&lt;BR /&gt;
The immediate parsing of the numbers in the address is less of a problem than deciding for what that number is provided, e.g.floor/apartment/building/business-park-block number!&lt;BR /&gt;
&lt;BR /&gt;
OK here is the first number [pre]   number1 = input( scan( full_address,1, 'a' ), 32. );[/pre]&lt;BR /&gt;
Just alter that 1 to N to get the N-th number out of the address.&lt;BR /&gt;
&lt;BR /&gt;
I shall have to wait for SAS9.2 to see if the addition of 'a' modifier to SCAN() works the way that suits this challenge.&lt;BR /&gt;
&lt;BR /&gt;
Here is something I've tested in SAS9.1.3, with the examples Cynthia provided[pre]data trial;&lt;BR /&gt;
infile cards;&lt;BR /&gt;
input full_address $char80. ;&lt;BR /&gt;
   number1 = input( substr( full_address, 1&lt;BR /&gt;
                          , findc( trim(full_address) !!'0-'&lt;BR /&gt;
                                 , ' 1234567890', 'v' )  -1&lt;BR /&gt;
                          )&lt;BR /&gt;
                   , 32. );&lt;BR /&gt;
list;cards;&lt;BR /&gt;
0001050 Elm&lt;BR /&gt;
00001050 Elm??? or &lt;BR /&gt;
...[/pre]&lt;BR /&gt;
&lt;BR /&gt;
It is not as attractive as the SAS9.2 solution, but for now, it delivers. ~~ as below&lt;BR /&gt;
[pre]+FSVIEW:  WORK.TRIAL (B)--------------------------------------+                                    &lt;BR /&gt;
|      number1     full_address                               |                                    &lt;BR /&gt;
|                                                             |                                    &lt;BR /&gt;
|         1050     0001050 Elm                                |                                    &lt;BR /&gt;
|         1050     00001050 Elm??? or                         |                                    &lt;BR /&gt;
|          123     000123 NE 1st St???  Is the field justified|                                    &lt;BR /&gt;
|          123     123 Elm                                    |                                    &lt;BR /&gt;
|          123     0123 Elm                                   |                                    &lt;BR /&gt;
|          123     00123 Elm                                  |                                    &lt;BR /&gt;
|          123     000123 Elm                                 |                                    &lt;BR /&gt;
|          123     0000123 Elm                                |                                    &lt;BR /&gt;
|          123     00000123 Elm                               |                                    &lt;BR /&gt;
|         1050     1050 Elm                                   |                                    &lt;BR /&gt;
|         1050     001050 Elm                                 |                                    &lt;BR /&gt;
|         1050     01050 Elm                                  |                                    &lt;BR /&gt;
|         1050     00001050 Elm                               |                                    &lt;BR /&gt;
|         1111     1111 Twisted Turny Long Avenue, NE         |                                    &lt;BR /&gt;
|            .                                                |                                    &lt;BR /&gt;
|          150     0000150TH and                              |                                    &lt;BR /&gt;
|         1050     0001050 Elm if you don't stop it at the rig|                                    &lt;BR /&gt;
|            0     0000NE 1st ST (so you'd stop at the alpha c|                                    &lt;BR /&gt;
|            0     000SE 10th Ave                             |                                    &lt;BR /&gt;
|            0     000RR #3                                   |                                    &lt;BR /&gt;
|            0     0000SR 23                                  |                                    &lt;BR /&gt;
|            0     0000PO Box 61                              |                                    &lt;BR /&gt;
|          123     123 Elm might have these variations:       |                                    &lt;BR /&gt;
|            .                                                |                                    &lt;BR /&gt;
|          123     000000000000000000000000000123 Elm         |                                    &lt;BR /&gt;
|          123     000000000000000000000000123 Elm St         |                                    &lt;BR /&gt;
|          123     00000000000000000000000123 Elm St.         |                                    &lt;BR /&gt;
|          123     00000000000000000000123 Elm Street         |                                    &lt;BR /&gt;
|         1050     000000000000000000000000001050 Elm         |                                    &lt;BR /&gt;
|         1234     0000001234 NE 1st Street, Extended         |                                    &lt;BR /&gt;
|         1111     1111 Twisted Turny Long Avenue, NE         |                                                                        &lt;BR /&gt;
|                                                             |                                    &lt;BR /&gt;
+-------------------------------------------------------------+[/pre]&lt;BR /&gt;
missing values for "number1" were created for blank datalines.&lt;BR /&gt;
&lt;BR /&gt;
PeterC</description>
      <pubDate>Mon, 31 Mar 2008 20:00:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7898#M242</guid>
      <dc:creator>Peter_C</dc:creator>
      <dc:date>2008-03-31T20:00:28Z</dc:date>
    </item>
    <item>
      <title>Re: inporting character data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7899#M243</link>
      <description>Thanks, Peter.  &lt;BR /&gt;
&lt;BR /&gt;
That's a very clever solution.  Hadn't occurred to me at all.</description>
      <pubDate>Mon, 31 Mar 2008 20:03:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7899#M243</guid>
      <dc:creator>alanj</dc:creator>
      <dc:date>2008-03-31T20:03:27Z</dc:date>
    </item>
    <item>
      <title>Re: inporting character data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7900#M244</link>
      <description>Hi:&lt;BR /&gt;
  I was going to recommend the PRX (Perl Regular Expressions) method, too -- but Tim beat me to it. Perl regular expressions are cool and were introduced in SAS 9. For more info, see here:&lt;BR /&gt;
&lt;A href="http://www2.sas.com/proceedings/sugi29/265-29.pdf" target="_blank"&gt;http://www2.sas.com/proceedings/sugi29/265-29.pdf&lt;/A&gt;&lt;BR /&gt;
&lt;A href="http://support.sas.com/rnd/base/datastep/perl_regexp/index.html" target="_blank"&gt;http://support.sas.com/rnd/base/datastep/perl_regexp/index.html&lt;/A&gt;&lt;BR /&gt;
&lt;A href="http://analytics.ncsu.edu/sesug/2006/AP09_06.PDF" target="_blank"&gt;http://analytics.ncsu.edu/sesug/2006/AP09_06.PDF&lt;/A&gt;&lt;BR /&gt;
&lt;A href="http://www.pharmasug.org/2005/TU02.pdf" target="_blank"&gt;http://www.pharmasug.org/2005/TU02.pdf&lt;/A&gt;&lt;BR /&gt;
&lt;BR /&gt;
I think that otherwise, something with scan, substr, findc, anyalpha and/or notalpha functions could help you divide the address into the leading zeroes and the rest of the field -- but not as elegantly as PRX functions.&lt;BR /&gt;
 &lt;BR /&gt;
cynthia</description>
      <pubDate>Mon, 31 Mar 2008 22:52:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/inporting-character-data/m-p/7900#M244</guid>
      <dc:creator>Cynthia_sas</dc:creator>
      <dc:date>2008-03-31T22:52:55Z</dc:date>
    </item>
  </channel>
</rss>

