BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Alexxxxxxx
Pyrite | Level 9

Hello,

 

I would like to import a .txt file into SAS. These variables seem to follow a format, however, they are separated by single space. Could you please give me some suggestions to import this data?

 

 

1.the data source of the geographic name is extracted from http://download.geonames.org/export/zip/

the part of data looks like below.

 

GB	DN14	Adlingfleet	England	ENG	East Riding of Yorkshire	11609011			53.6788	-0.7231	4
GB	DN14	Howden	England	ENG	East Riding of Yorkshire	11609011			53.7421	-0.8657	4
GB	DN14	Spaldington	England	ENG	East Riding of Yorkshire	11609011			53.7927	-0.849	4
GB	DN14	Faxfleet	England	ENG	East Riding of Yorkshire	11609011			53.7067	-0.6939	4
GB	DN14	Blacktoft	England	ENG	East Riding of Yorkshire	11609011			53.708	-0.7273	4
GB	DN14	Barmby on the Marsh	England	ENG	East Riding of Yorkshire	11609011			53.749	-0.9561	4
GB	DN14	Whitgift	England	ENG	East Riding of Yorkshire	11609011			53.6953	-0.7671	4
GB	DN14	Asselby	England	ENG	East Riding of Yorkshire	11609011			53.7427	-0.8849	4
GB	DN14	Newsholme	England	ENG	East Riding of Yorkshire	11609011			53.7125	-0.8909	3
GB	DN14	Laxton	England	ENG	East Riding of Yorkshire	11609011			53.7167	-0.8	4
GB	DN14	Balkholme	England	ENG	East Riding of Yorkshire	11609011			53.7432	-0.8087	4
GB	DN14	Snaith	England	ENG	East Riding of Yorkshire	11609011			53.6911	-1.0286	4
GB	DN14	Knedlington	England	ENG	East Riding of Yorkshire	11609011			53.7427	-0.8849	4
GB	DN14	Swinefleet	England	ENG	East Riding of Yorkshire	11609011			53.6921	-0.8333	4
GB	DN14	Gowdall	England	ENG	East Riding of Yorkshire	11609011			53.693	-1.0597	3
GB	DN14	Yokefleet	England	ENG	East Riding of Yorkshire	11609011			53.708	-0.7574	4
GB	DN14	Kilpin	England	ENG	East Riding of Yorkshire	11609011			53.7316	-0.8312	4
GB	DN14	Rawcliffe	England	ENG	East Riding of Yorkshire	11609011			53.6833	-0.95	4
GB	DN14	Airmyn	England	ENG	East Riding of Yorkshire	11609011			53.7207	-0.8996	4
GB	DN14	Ousefleet	England	ENG	East Riding of Yorkshire	11609011			53.6982	-0.7409	4
GB	DN14	Skelton	England	ENG	East Riding of Yorkshire	11609011			53.7167	-0.8333	4
GB	DN14	Eastrington	England	ENG	East Riding of Yorkshire	11609011			53.7604	-0.7932	4
GB	DN14	Saltmarshe	England	ENG	East Riding of Yorkshire	11609011			53.7216	-0.8049	4
GB	DN14	Old Goole	England	ENG	East Riding of Yorkshire	11609011			53.7125	-0.8909	3
GB	DN14	Goole	England	ENG	East Riding of Yorkshire	11609011			53.7167	-0.8667	4
GB	DN14	Reedness	England	ENG	East Riding of Yorkshire	11609011			53.6993	-0.7986	4
GB	DN14	Hook	England	ENG	East Riding of Yorkshire	11609011			53.7205	-0.8479	4
GB	DN14	Gribthorpe	England	ENG	East Riding of Yorkshire	11609011			53.7125	-0.8909	3
GB	DN14	East Cowick	England	ENG	East Riding of Yorkshire	11609011			53.7125	-0.8909	3
GB	DN14	Bellasize	England	ENG	East Riding of Yorkshire	11609011			53.7125	-0.8909	3
GB	DN14	Rawcliffe Bridge	England	ENG	East Riding of Yorkshire	11609011			53.7125	-0.8909	3
GB	DN14	Pollington	England	ENG	East Riding of Yorkshire	11609011			53.6709	-1.0724	4

The data format is tab-delimited text in utf8 encoding, with the following fields :

country code : iso country code, 2 characters
postal code : varchar(20)
place name : varchar(180)
admin name1 : 1. order subdivision (state) varchar(100)
admin code1 : 1. order subdivision (state) varchar(20)
admin name2 : 2. order subdivision (county/province) varchar(100)
admin code2 : 2. order subdivision (county/province) varchar(20)
admin name3 : 3. order subdivision (community) varchar(100)
admin code3 : 3. order subdivision (community) varchar(20)
latitude : estimated latitude (wgs84)
longitude : estimated longitude (wgs84)
accuracy : accuracy of lat/lng from 1=estimated, 4=geonameid, 6=cent

(source: http://download.geonames.org/export/zip/)

 

I use the following code 

data Sa_step6.Geo_GB;
		infile "C:\Users\70660\Downloads\GeoNames\GB.txt"
		dlm=' ' ;
		input
country_code      :$2.                      /*iso country code, 2 characters*/
postal_code       :$20.                     /*varchar(20)*/
place_name        :$180.                    /*varchar(180)*/
admin_name1       :$100.                    /*1. order subdivision (state) varchar(100)*/
admin_code1       :$20.                     /*1. order subdivision (state) varchar(20)*/
admin_name2       :$100.                    /*2. order subdivision (county/province) varchar(100)*/
admin_code2       :$20.                     /*2. order subdivision (county/province) varchar(20)*/
admin_name3       :$100.                    /*3. order subdivision (community) varchar(100)*/
admin_code3       :$20.                     /*3. order subdivision (community) varchar(20)*/
latitude          :7.4                      /*estimated latitude (wgs84)*/
longitude         :7.4                      /*estimated longitude (wgs84)*/
accuracy          :1                        /*accuracy of lat/lng from 1=estimated, 4=geonameid, 6=centroid of addresses or shape*/
;
run;

However, the result is

NOTE: Invalid data for latitude in line 3 38-43.
NOTE: Invalid data for longitude in line 3 45-46.
NOTE: Invalid data for accuracy in line 3 1-36.
RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9--

3   CHAR  GB.DN14.Spaldington.England.ENG.East Riding of Yorkshire.11609011...53.7927.-0.849.4 84
    ZONE  440443305766666676604666666044404677256666626625676766760333333330003323333023233303
    NUMR  7294E149301C49E74FE95E7C1E495E79513402949E70F609F2B3892591160901199953E79279D0E84994
country_code=GB postal_code=Riding place_name=of admin_name1=Yorkshire	11609011			53.6788	-0.7231	4
admin_code1=GB	DN14	Howden	Engla admin_name2=Riding admin_code2=of
admin_name3=Yorkshire	11609011			53.7421	-0.8657	4 admin_code3=GB	DN14	Spaldington	 latitude=.
longitude=. accuracy=. _ERROR_=1 _N_=1
NOTE: Invalid data for latitude in line 6 16-17.
NOTE: Invalid data for longitude in line 6 19-21.
NOTE: Invalid data for accuracy in line 6 1-14.

6   CHAR  GB.DN14.Barmby on the Marsh.England.ENG.East Riding of Yorkshire.11609011...53.749.-0.9561.4 92
    ZONE  44044330467667266276624677604666666044404677256666626625676766760333333330003323330232333303
    NUMR  7294E149212D290FE04850D123895E7C1E495E79513402949E70F609F2B3892591160901199953E7499D0E956194
country_code=GB postal_code=Riding place_name=of admin_name1=Yorkshire	11609011			53.7067	-0.6939	4
admin_code1=GB	DN14	Blacktoft	En admin_name2=Riding admin_code2=of
admin_name3=Yorkshire	11609011			53.708	-0.7273	4 admin_code3=GB	DN14	Barmby latitude=. longitude=.
accuracy=. _ERROR_=1 _N_=2
NOTE: Invalid data for latitude in line 9 36-41.
NOTE: Invalid data for longitude in line 9 43-44.
NOTE: Invalid data for accuracy in line 9 1-34.

Could you please give me some suggestions?

 

Besides,

Do you know which database includes all place name around the world?

 

I hope to show them in the following way.

Sovereign state

Country

Region

Shire county

District

United Kingdom

England

East

Essex

Colchester

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Use

infile "C:\Users\70660\Downloads\GeoNames\GB.txt" dlm='09'x dsd;

to read a tab-delimited file.

For the numbers, use a 12. informat (it takes care of the decimal points by itself); everything else looks OK at first glance.

View solution in original post

2 REPLIES 2
Kurt_Bremser
Super User

Use

infile "C:\Users\70660\Downloads\GeoNames\GB.txt" dlm='09'x dsd;

to read a tab-delimited file.

For the numbers, use a 12. informat (it takes care of the decimal points by itself); everything else looks OK at first glance.

Tom
Super User Tom
Super User

I find that it always easier to define the variables first, before using them in INPUT or assignment statements. Then SAS doesn't need to guess.  So take your specs.

country code : iso country code, 2 characters
postal code : varchar(20)
place name : varchar(180)
admin name1 : 1. order subdivision (state) varchar(100)
admin code1 : 1. order subdivision (state) varchar(20)
admin name2 : 2. order subdivision (county/province) varchar(100)
admin code2 : 2. order subdivision (county/province) varchar(20)
admin name3 : 3. order subdivision (community) varchar(100)
admin code3 : 3. order subdivision (community) varchar(20)
latitude : estimated latitude (wgs84)
longitude : estimated longitude (wgs84)
accuracy : accuracy of lat/lng from 1=estimated, 4=geonameid, 6=cent

and convert them into a LENGTH statement. SAS always stores numbers as 8 byte floating point numbers so the length is 8.

length
country_code  $2
postal_code $20
place_name $180
admin_name1 $100 
admin_code1 $20
admin_name2 $100 
admin_code2 $20 
admin_name3 $100 
admin_code3 $20 
latitude 8
longitude 8
accuracy 8
;

Then you just need to add your data , infile and input statements. You could also add LABEL statement.  If you had and DATE, TIME or DATETIME values then you will need FORMAT statement and either an INFORMAT statement or add informats into the INPUT statement. Note that for normal variables there is no need for informat specifications in the INPUT statement, SAS already knows how to read in numbers and character strings.

Note that if you have defined the variables in the dataset in the same order they appear in the text file the input statement can just use a positional variable list.

data Sa_step6.Geo_GB;
  infile "C:\Users\70660\Downloads\GeoNames\GB.txt" dsd dlm='09'x truncover;
  length
    country_code  $2
    postal_code $20
    place_name $180
    admin_name1 $100 
    admin_code1 $20
    admin_name2 $100 
    admin_code2 $20 
    admin_name3 $100 
    admin_code3 $20 
    latitude 8
    longitude 8
    accuracy 8
  ;
  input country_code -- accuracy ;
run;

Note that if your file has a header row then add the FIRSTOBS=2 option to the INFILE statement to skip that header line.

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 791 views
  • 1 like
  • 3 in conversation