BookmarkSubscribeRSS Feed
uabcms
Calcite | Level 5

I have a single column excel file with full name on first line followed be some form of an address on the next 1 to 3 lines.

I need to create observations with separate variables for first name, last name, first and second address(if  available), city, state and zip.

A comma always follows the city name, no matter what the configuration of the data is.

column 1

Doe, jane

33 first st
anywhere, al  35207
Doe, Janie
3245 Southview Cr  Homewood, AL  34564
Doe, John
PO Box 234
Bham, AL   35426
Doe, Johnny
32948 Clairmont Ave
#5
Hoover, AL   23455
3 REPLIES 3
ballardw
Super User

Good luck. With single addresses on multiple lines and incorrect ZIP code state combinations (23455 is VA not AL), the street and city/state on one line I hope you don't too many of these.

With something like this I'd start with:

No digits -> likely to be a name

Fewer than 5 or more than 5 but fewer than 9 digits -> likely to be at least part of a street address

If exactly 5 or 9 digits and a comma then check the number of a valid zip code (function ZIPCITY may help).

Names are likely to be a headache all by themselves depending on who compiled this:

Smith Jr. , John

Smith , John Jr

John Smith Jr

Dr John Smith Jr (and permutations and other honorifics Miss, Ms, Mrs, Mr, Reverand, Doctor and others abbreviated and/or spelled out)

For *********** sake, this is 2015. Who is still putting first and last name in the same field (government agencies aside).

If this spreadsheet was exported from another application go back and research output options from that app.

uabcms
Calcite | Level 5

Yeah, it's a mess.  Am checking other output options.

BTW,  i just made started typing #'s for the zips for the example data.  The zips in the file are correct.

Thanks for responding.

Ksharp
Super User

If there was only two words for Name , that would be easy to group them.

data have;
input x $50.;
if countw(x,' ,') = 2 and not anydigit(x) then group+1;
cards;
Doe, jane
33 first st
anywhere, al  35207
Doe, Janie
3245 Southview Cr  Homewood, AL  34564
Doe, John
PO Box 234
Bham, AL   35426
Doe, Johnny
32948 Clairmont Ave
#5
Hoover, AL   23455
;
run;

Xia Keshan

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1266 views
  • 0 likes
  • 3 in conversation