11-07-2016 04:20 PM
I have a question about reading a specific type of raw data.
Suppose I have a data that looks something like this:
340 West 85th Street
New York NY 10024
7901 Annapolis Road
Lanham MD 20706
And I want the 1st line to be an Address, 2nd line contains City (New York), State (NY), Zip (10024), and 3rd line contains Latitude(40.79) and Longitude (-73.98).
So the code may look something like this (it is not correct):
data temp; input #1 Address $ #2 City $ State $ Zip #3 Latitude Longitude; datalines; ...; run;
Of course, this one does not work well. I am just wondering how to read a raw data like this (with multiple lines).
11-07-2016 04:53 PM - edited 11-07-2016 04:55 PM
you could write something like this as long as you make sure there are two blanks between the City and the state (to act as the delimiter of a multi-word-string).
I gave the city and address variable a size of 30
data temp; length address $30; length city $30; input #1 Address $ #2 City & $ State $ 2 Zip 5 #3 Latitude Longitude; datalines; ...; run;
11-07-2016 05:02 PM
The fact that there are multiple words per variable makes the processing a little tricky. Here's one way to approach it:
length address $ 50 city $ 30 state $ 2 zip $ 5 dummy $ 1;
address = _infile_;
zip = scan(_infile_, -1);
state = scan(_infile_, -2);
city = substr(_infile_, 1, length(_infile_)-9);
input latitude longitude;
You might want to inspect the zipcodes to make sure you don't have any longer values there. Also note, it's better to make zipcode character instead of numeric so you won't have to worry about leading zeros.