BookmarkSubscribeRSS Feed
gsnidow
Obsidian | Level 7

Greetings.  I'm working on some addresses (thank you Art and Tom), that I am needing to parse and split into columns that can be joined to a DB2 table.  I have one column in a spreadsheet containing the address, and I need to pull out the house number, the cardinal direction if present, and the street name.  So, for example, if the address is '100 W Oak St', I need to pull out 100, 'W', and 'Oak'.  I do not need the 'St', because it won't match what is the database in most cases, so I just want to omit it.  I am parsing the address into an array, then pulling out the values based on some simple rules.  I am good to go up until I need to stop building the street name at 'St' or 'Rd', etc.  My thought was that I could select all the street names into a variable, then use it in a list to stop the loop.  However, I also need to include all instances of street names appended with a '.' and a ',' since these are often found following the street abbreviation.  The attached file contains the code, and hopefully adequate comments.  Dataset 'Step1' is just what I need, but with the 'Rd', 'St', etc lopped off the end.  Also, I realize streets like 'St Asaph', like is in the data will cause a problem to logic looking for a street abbreviation, and that is okay.  I'm not looking for a 100% match to DB2, and since those are not common, I'm okay with leaving them blank.

Thank you

Greg

10 REPLIES 10
art297
Opal | Level 21

If your addresses are really that clean, I would just approach the problem differently.  How about something like?:

data want (drop=_:);

  set addresses;

  length AD_HSE_NO 8.; *AD_HSE_NO is the database column for house number;

  length AD_CRDNL_DIR $2.; *AD_CRDNL_DIR is the database column for direction;

  length AD_STR_NM $50.; *AD_STR_NM is the database name for street name;

  CALL SCAN(address, 2, _position, _length );

  AD_HSE_NO=input(substr(address,1,_position-2),12.);

  AD_STR_NM=strip(translate(substr(address,_position)," ","."));

  if substr(AD_STR_NM,length(AD_STR_NM)-1) in (" N"," S"," E"," W") then do;

    AD_CRDNL_DIR=substr(AD_STR_NM,length(AD_STR_NM)-1);

    AD_STR_NM=strip(substr(AD_STR_NM,1,length(AD_STR_NM)-2));

  end;

  else if substr(AD_STR_NM,1,2) in ("N ","S ","E ","W ") then do;

    AD_CRDNL_DIR=substr(AD_STR_NM,1,1);

    AD_STR_NM=strip(substr(AD_STR_NM,3));

  end;

  CALL SCAN(AD_STR_NM, -1, _position, _length );

  AD_STR_NM=strip(substr(AD_STR_NM,1,_position-1));

run;

gsnidow
Obsidian | Level 7

Thank you Arthur.  And therein lies the problem, they are not really that clean, I simply pulled out some of the easy ones to start with.  For example, an address might be '100 Oak St behind garage'.  The street abbreviation is not always in the last position, or I could do like you have done.  I did not know about SCAN, I'll read up on that.  Also, I put the array in there just so I could see what was happening, but I got rid of it now so it all happens in one pass.  I'm still stuck on an easy way to stop at the street abbreviation.  Thank you so much for your good time and effort.

Greg


Linlin
Lapis Lazuli | Level 10

this is what you want?

data Addresses;

  input Address & $100.;

  datalines;

200 Cameron Station Blvd

25 W Reed Ave

2701 Commonwealth Ave

5000 Echols Ave

3001 N Beauregard St

5650 Sanger Ave

5651 Rayburn Ave

530 S St Asaph St

215 Las Gaviotas Blvd.

4032 Bunch Walnuts Road

201 Stadium Dr

1116 Hillwell Rd

640 Battlefield Blvd S

298 Cedar Rd

212 Holt Dr

1280 Bells Mill Rd

1200 Cedar Rd

2901 Margaret Booker Dr

509 George Washington Hwy N

;

run;

data test;

  set addresses;

  n=max(find(address,'Blvd'),find(address,'Ave'),find(address,'St'),find(address,'Road'),find(address,'Dr'),find(address,'Hwy'),find(address,'Rd'));

  number=scan(address,1,' ');

  add=substr(address,anyalpha(address),n-length(number)-2);

  drop n;

proc print;run;

Obs    Address                        number    add

           1    200 Cameron Station Blvd        200      Cameron Station

           2    25 W Reed Ave                   25       W Reed

           3    2701 Commonwealth Ave           2701     Commonwealth

           4    5000 Echols Ave                 5000     Echols

           5    3001 N Beauregard St            3001     N Beauregard

           6    5650 Sanger Ave                 5650     Sanger

           7    5651 Rayburn Ave                5651     Rayburn

           8    530 S St Asaph St               530      S

           9    215 Las Gaviotas Blvd.          215      Las Gaviotas

          10    4032 Bunch Walnuts Road         4032     Bunch Walnuts

          11    201 Stadium Dr                  201      Stadium

          12    1116 Hillwell Rd                1116     Hillwell

          13    640 Battlefield Blvd S          640      Battlefield

          14    298 Cedar Rd                    298      Cedar

          15    212 Holt Dr                     212      Holt

          16    1280 Bells Mill Rd              1280     Bells Mill

          17    1200 Cedar Rd                   1200     Cedar

          18    2901 Margaret Booker Dr         2901     Margaret Booker

          19    509 George Washington Hwy N     509      George Washington

gsnidow
Obsidian | Level 7

Linlin, that's close, but I need to have the cardinal direction in its own colum, as I have in my example.  What I am trying to avoid is hard coding all the possible values of street abbreviations.  Thank you.

art297
Opal | Level 21

You could add Linlin's suggestion onto the code I proposed.  Probably still won't solve all of your problems and, yes, would have to be adapted to include all of the street type words and abbreviations that exist in your data, but should get you close:

data want (drop=_:);

  set addresses;

  length AD_HSE_NO 8.; *AD_HSE_NO is the database column for house number;

  length AD_CRDNL_DIR $2.; *AD_CRDNL_DIR is the database column for direction;

  length AD_STR_NM $50.; *AD_STR_NM is the database name for street name;

  CALL SCAN(address, 2, _position, _length );

  AD_HSE_NO=input(substr(address,1,_position-2),12.);

  AD_STR_NM=strip(translate(substr(address,_position)," ","."));

  if substr(AD_STR_NM,length(AD_STR_NM)-1) in (" N"," S"," E"," W") then do;

    AD_CRDNL_DIR=substr(AD_STR_NM,length(AD_STR_NM)-1);

    AD_STR_NM=strip(substr(AD_STR_NM,1,length(AD_STR_NM)-2));

  end;

  else if substr(AD_STR_NM,1,2) in ("N ","S ","E ","W ") then do;

    AD_CRDNL_DIR=substr(AD_STR_NM,1,1);

    AD_STR_NM=strip(substr(AD_STR_NM,3));

  end;

  _n=max(find(address,'Blvd'),find(address,'Ave'),find(address,'St'),

        find(address,'Road'),find(address,'Dr'),find(address,'Hwy'),

        find(address,'Rd'));

  if _n gt 0 then AD_STR_NM=strip(substr(AD_STR_NM,1,_n-1));

run;

gsnidow
Obsidian | Level 7

Yes, I might have to revert back to good old SQL for this one.  My initial reason for not wanting to hard code all the possible values of street abbreviation is that I don't want to have a line of code that is a thousand characters wide.  Is there a way to continue a statment on the next line?  For example, consider the below...

if x in ('a','b','c','d','e','f','g','h') then do;

In VB I can do this...

if x in ('a','b','c','d' & _

          'e','f','g','h') then do;

where the '& _' tells the compiler to continue to the next line.  Can we do this in SAS?  If so, I would not mind typing all the abbreviations out.

Thank you

Greg

Linlin
Lapis Lazuli | Level 10

Yes.

"if x in ('a','b','c','d','e','f','g','h') then do;"

is the same as

"if x in ('a'

,'b'

,'c'

,'d','e','f','g','h')

then do;"

gsnidow
Obsidian | Level 7

DOH!!!  That seemed to easy to try.  I got so caught up on having a ';' at the end of every line, I did not even think of that.  Thank you.

Greg

gsnidow
Obsidian | Level 7

Arthur, back to the scan function.  Boy, did that eliminate my need for all the substrings and parsing I did manually. It seems scan is programmed to do exactly what I was doing, for the most part. One thing I cannot wrap my head around is why sometimes it is necessary to use 'call'.  In this case, I can simply do...

do i = 1 to 10 by 1;

     x = scan(string,i);

end;

What is the 'call' doing?  Thank you.

Greg

data_null__
Jade | Level 19

As you already know functions and CALL routines are similar.  In general functions return a value and CALL routines alter the values of variables.

With regards to scan, the function returns the string that is being scanned for.  The call routine returns information about the location of that string.  Depending on what you are doing the information may be more useful than the string.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1089 views
  • 6 likes
  • 4 in conversation