I have a single column excel file with full name on first line followed be some form of an address on the next 1 to 3 lines.
I need to create observations with separate variables for first name, last name, first and second address(if available), city, state and zip.
A comma always follows the city name, no matter what the configuration of the data is.
column 1 |
---|
Doe, jane |
33 first st |
anywhere, al 35207 |
Doe, Janie |
3245 Southview Cr Homewood, AL 34564 |
Doe, John |
PO Box 234 |
Bham, AL 35426 |
Doe, Johnny |
32948 Clairmont Ave |
#5 |
Hoover, AL 23455 |
Good luck. With single addresses on multiple lines and incorrect ZIP code state combinations (23455 is VA not AL), the street and city/state on one line I hope you don't too many of these.
With something like this I'd start with:
No digits -> likely to be a name
Fewer than 5 or more than 5 but fewer than 9 digits -> likely to be at least part of a street address
If exactly 5 or 9 digits and a comma then check the number of a valid zip code (function ZIPCITY may help).
Names are likely to be a headache all by themselves depending on who compiled this:
Smith Jr. , John
Smith , John Jr
John Smith Jr
Dr John Smith Jr (and permutations and other honorifics Miss, Ms, Mrs, Mr, Reverand, Doctor and others abbreviated and/or spelled out)
For *********** sake, this is 2015. Who is still putting first and last name in the same field (government agencies aside).
If this spreadsheet was exported from another application go back and research output options from that app.
Yeah, it's a mess. Am checking other output options.
BTW, i just made started typing #'s for the zips for the example data. The zips in the file are correct.
Thanks for responding.
If there was only two words for Name , that would be easy to group them.
data have; input x $50.; if countw(x,' ,') = 2 and not anydigit(x) then group+1; cards; Doe, jane 33 first st anywhere, al 35207 Doe, Janie 3245 Southview Cr Homewood, AL 34564 Doe, John PO Box 234 Bham, AL 35426 Doe, Johnny 32948 Clairmont Ave #5 Hoover, AL 23455 ; run;
Xia Keshan
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.