02-09-2015 05:07 PM
I have a single column excel file with full name on first line followed be some form of an address on the next 1 to 3 lines.
I need to create observations with separate variables for first name, last name, first and second address(if available), city, state and zip.
A comma always follows the city name, no matter what the configuration of the data is.
|33 first st|
|anywhere, al 35207|
|3245 Southview Cr Homewood, AL 34564|
|PO Box 234|
|Bham, AL 35426|
|32948 Clairmont Ave|
|Hoover, AL 23455|
02-09-2015 06:57 PM
Good luck. With single addresses on multiple lines and incorrect ZIP code state combinations (23455 is VA not AL), the street and city/state on one line I hope you don't too many of these.
With something like this I'd start with:
No digits -> likely to be a name
Fewer than 5 or more than 5 but fewer than 9 digits -> likely to be at least part of a street address
If exactly 5 or 9 digits and a comma then check the number of a valid zip code (function ZIPCITY may help).
Names are likely to be a headache all by themselves depending on who compiled this:
Smith Jr. , John
Smith , John Jr
John Smith Jr
Dr John Smith Jr (and permutations and other honorifics Miss, Ms, Mrs, Mr, Reverand, Doctor and others abbreviated and/or spelled out)
For *********** sake, this is 2015. Who is still putting first and last name in the same field (government agencies aside).
If this spreadsheet was exported from another application go back and research output options from that app.
02-10-2015 10:12 AM
Yeah, it's a mess. Am checking other output options.
BTW, i just made started typing #'s for the zips for the example data. The zips in the file are correct.
Thanks for responding.
02-10-2015 05:27 AM
If there was only two words for Name , that would be easy to group them.
data have; input x $50.; if countw(x,' ,') = 2 and not anydigit(x) then group+1; cards; Doe, jane 33 first st anywhere, al 35207 Doe, Janie 3245 Southview Cr Homewood, AL 34564 Doe, John PO Box 234 Bham, AL 35426 Doe, Johnny 32948 Clairmont Ave #5 Hoover, AL 23455 ; run;