The SAS Output Delivery System and reporting techniques

reading an excel file

Reply
Occasional Contributor
Posts: 17

reading an excel file

I have a single column excel file with full name on first line followed be some form of an address on the next 1 to 3 lines.

I need to create observations with separate variables for first name, last name, first and second address(if  available), city, state and zip.

A comma always follows the city name, no matter what the configuration of the data is.

column 1

Doe, jane

33 first st
anywhere, al  35207
Doe, Janie
3245 Southview Cr  Homewood, AL  34564
Doe, John
PO Box 234
Bham, AL   35426
Doe, Johnny
32948 Clairmont Ave
#5
Hoover, AL   23455
Super User
Posts: 11,134

Re: reading an excel file

Good luck. With single addresses on multiple lines and incorrect ZIP code state combinations (23455 is VA not AL), the street and city/state on one line I hope you don't too many of these.

With something like this I'd start with:

No digits -> likely to be a name

Fewer than 5 or more than 5 but fewer than 9 digits -> likely to be at least part of a street address

If exactly 5 or 9 digits and a comma then check the number of a valid zip code (function ZIPCITY may help).

Names are likely to be a headache all by themselves depending on who compiled this:

Smith Jr. , John

Smith , John Jr

John Smith Jr

Dr John Smith Jr (and permutations and other honorifics Miss, Ms, Mrs, Mr, Reverand, Doctor and others abbreviated and/or spelled out)

For *********** sake, this is 2015. Who is still putting first and last name in the same field (government agencies aside).

If this spreadsheet was exported from another application go back and research output options from that app.

Occasional Contributor
Posts: 17

Re: reading an excel file

Yeah, it's a mess.  Am checking other output options.

BTW,  i just made started typing #'s for the zips for the example data.  The zips in the file are correct.

Thanks for responding.

Super User
Posts: 9,867

Re: reading an excel file

If there was only two words for Name , that would be easy to group them.

data have;
input x $50.;
if countw(x,' ,') = 2 and not anydigit(x) then group+1;
cards;
Doe, jane
33 first st
anywhere, al  35207
Doe, Janie
3245 Southview Cr  Homewood, AL  34564
Doe, John
PO Box 234
Bham, AL   35426
Doe, Johnny
32948 Clairmont Ave
#5
Hoover, AL   23455
;
run;

Xia Keshan

Ask a Question
Discussion stats
  • 3 replies
  • 439 views
  • 0 likes
  • 3 in conversation