Hello,
I would like to import a .txt file into SAS. These variables seem to follow a format, however, they are separated by single space. Could you please give me some suggestions to import this data?
1.the data source of the geographic name is extracted from http://download.geonames.org/export/zip/
the part of data looks like below.
GB DN14 Adlingfleet England ENG East Riding of Yorkshire 11609011 53.6788 -0.7231 4
GB DN14 Howden England ENG East Riding of Yorkshire 11609011 53.7421 -0.8657 4
GB DN14 Spaldington England ENG East Riding of Yorkshire 11609011 53.7927 -0.849 4
GB DN14 Faxfleet England ENG East Riding of Yorkshire 11609011 53.7067 -0.6939 4
GB DN14 Blacktoft England ENG East Riding of Yorkshire 11609011 53.708 -0.7273 4
GB DN14 Barmby on the Marsh England ENG East Riding of Yorkshire 11609011 53.749 -0.9561 4
GB DN14 Whitgift England ENG East Riding of Yorkshire 11609011 53.6953 -0.7671 4
GB DN14 Asselby England ENG East Riding of Yorkshire 11609011 53.7427 -0.8849 4
GB DN14 Newsholme England ENG East Riding of Yorkshire 11609011 53.7125 -0.8909 3
GB DN14 Laxton England ENG East Riding of Yorkshire 11609011 53.7167 -0.8 4
GB DN14 Balkholme England ENG East Riding of Yorkshire 11609011 53.7432 -0.8087 4
GB DN14 Snaith England ENG East Riding of Yorkshire 11609011 53.6911 -1.0286 4
GB DN14 Knedlington England ENG East Riding of Yorkshire 11609011 53.7427 -0.8849 4
GB DN14 Swinefleet England ENG East Riding of Yorkshire 11609011 53.6921 -0.8333 4
GB DN14 Gowdall England ENG East Riding of Yorkshire 11609011 53.693 -1.0597 3
GB DN14 Yokefleet England ENG East Riding of Yorkshire 11609011 53.708 -0.7574 4
GB DN14 Kilpin England ENG East Riding of Yorkshire 11609011 53.7316 -0.8312 4
GB DN14 Rawcliffe England ENG East Riding of Yorkshire 11609011 53.6833 -0.95 4
GB DN14 Airmyn England ENG East Riding of Yorkshire 11609011 53.7207 -0.8996 4
GB DN14 Ousefleet England ENG East Riding of Yorkshire 11609011 53.6982 -0.7409 4
GB DN14 Skelton England ENG East Riding of Yorkshire 11609011 53.7167 -0.8333 4
GB DN14 Eastrington England ENG East Riding of Yorkshire 11609011 53.7604 -0.7932 4
GB DN14 Saltmarshe England ENG East Riding of Yorkshire 11609011 53.7216 -0.8049 4
GB DN14 Old Goole England ENG East Riding of Yorkshire 11609011 53.7125 -0.8909 3
GB DN14 Goole England ENG East Riding of Yorkshire 11609011 53.7167 -0.8667 4
GB DN14 Reedness England ENG East Riding of Yorkshire 11609011 53.6993 -0.7986 4
GB DN14 Hook England ENG East Riding of Yorkshire 11609011 53.7205 -0.8479 4
GB DN14 Gribthorpe England ENG East Riding of Yorkshire 11609011 53.7125 -0.8909 3
GB DN14 East Cowick England ENG East Riding of Yorkshire 11609011 53.7125 -0.8909 3
GB DN14 Bellasize England ENG East Riding of Yorkshire 11609011 53.7125 -0.8909 3
GB DN14 Rawcliffe Bridge England ENG East Riding of Yorkshire 11609011 53.7125 -0.8909 3
GB DN14 Pollington England ENG East Riding of Yorkshire 11609011 53.6709 -1.0724 4
The data format is tab-delimited text in utf8 encoding, with the following fields :
country code : iso country code, 2 characters
postal code : varchar(20)
place name : varchar(180)
admin name1 : 1. order subdivision (state) varchar(100)
admin code1 : 1. order subdivision (state) varchar(20)
admin name2 : 2. order subdivision (county/province) varchar(100)
admin code2 : 2. order subdivision (county/province) varchar(20)
admin name3 : 3. order subdivision (community) varchar(100)
admin code3 : 3. order subdivision (community) varchar(20)
latitude : estimated latitude (wgs84)
longitude : estimated longitude (wgs84)
accuracy : accuracy of lat/lng from 1=estimated, 4=geonameid, 6=cent
(source: http://download.geonames.org/export/zip/)
I use the following code
data Sa_step6.Geo_GB;
infile "C:\Users\70660\Downloads\GeoNames\GB.txt"
dlm=' ' ;
input
country_code :$2. /*iso country code, 2 characters*/
postal_code :$20. /*varchar(20)*/
place_name :$180. /*varchar(180)*/
admin_name1 :$100. /*1. order subdivision (state) varchar(100)*/
admin_code1 :$20. /*1. order subdivision (state) varchar(20)*/
admin_name2 :$100. /*2. order subdivision (county/province) varchar(100)*/
admin_code2 :$20. /*2. order subdivision (county/province) varchar(20)*/
admin_name3 :$100. /*3. order subdivision (community) varchar(100)*/
admin_code3 :$20. /*3. order subdivision (community) varchar(20)*/
latitude :7.4 /*estimated latitude (wgs84)*/
longitude :7.4 /*estimated longitude (wgs84)*/
accuracy :1 /*accuracy of lat/lng from 1=estimated, 4=geonameid, 6=centroid of addresses or shape*/
;
run;
However, the result is
NOTE: Invalid data for latitude in line 3 38-43.
NOTE: Invalid data for longitude in line 3 45-46.
NOTE: Invalid data for accuracy in line 3 1-36.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9--
3 CHAR GB.DN14.Spaldington.England.ENG.East Riding of Yorkshire.11609011...53.7927.-0.849.4 84
ZONE 440443305766666676604666666044404677256666626625676766760333333330003323333023233303
NUMR 7294E149301C49E74FE95E7C1E495E79513402949E70F609F2B3892591160901199953E79279D0E84994
country_code=GB postal_code=Riding place_name=of admin_name1=Yorkshire 11609011 53.6788 -0.7231 4
admin_code1=GB DN14 Howden Engla admin_name2=Riding admin_code2=of
admin_name3=Yorkshire 11609011 53.7421 -0.8657 4 admin_code3=GB DN14 Spaldington latitude=.
longitude=. accuracy=. _ERROR_=1 _N_=1
NOTE: Invalid data for latitude in line 6 16-17.
NOTE: Invalid data for longitude in line 6 19-21.
NOTE: Invalid data for accuracy in line 6 1-14.
6 CHAR GB.DN14.Barmby on the Marsh.England.ENG.East Riding of Yorkshire.11609011...53.749.-0.9561.4 92
ZONE 44044330467667266276624677604666666044404677256666626625676766760333333330003323330232333303
NUMR 7294E149212D290FE04850D123895E7C1E495E79513402949E70F609F2B3892591160901199953E7499D0E956194
country_code=GB postal_code=Riding place_name=of admin_name1=Yorkshire 11609011 53.7067 -0.6939 4
admin_code1=GB DN14 Blacktoft En admin_name2=Riding admin_code2=of
admin_name3=Yorkshire 11609011 53.708 -0.7273 4 admin_code3=GB DN14 Barmby latitude=. longitude=.
accuracy=. _ERROR_=1 _N_=2
NOTE: Invalid data for latitude in line 9 36-41.
NOTE: Invalid data for longitude in line 9 43-44.
NOTE: Invalid data for accuracy in line 9 1-34.
Could you please give me some suggestions?
Besides,
Do you know which database includes all place name around the world?
I hope to show them in the following way.
Sovereign state | Country | Region | Shire county | District |
United Kingdom | England | East | Essex | Colchester |
Use
infile "C:\Users\70660\Downloads\GeoNames\GB.txt" dlm='09'x dsd;
to read a tab-delimited file.
For the numbers, use a 12. informat (it takes care of the decimal points by itself); everything else looks OK at first glance.
Use
infile "C:\Users\70660\Downloads\GeoNames\GB.txt" dlm='09'x dsd;
to read a tab-delimited file.
For the numbers, use a 12. informat (it takes care of the decimal points by itself); everything else looks OK at first glance.
I find that it always easier to define the variables first, before using them in INPUT or assignment statements. Then SAS doesn't need to guess. So take your specs.
country code : iso country code, 2 characters postal code : varchar(20) place name : varchar(180) admin name1 : 1. order subdivision (state) varchar(100) admin code1 : 1. order subdivision (state) varchar(20) admin name2 : 2. order subdivision (county/province) varchar(100) admin code2 : 2. order subdivision (county/province) varchar(20) admin name3 : 3. order subdivision (community) varchar(100) admin code3 : 3. order subdivision (community) varchar(20) latitude : estimated latitude (wgs84) longitude : estimated longitude (wgs84) accuracy : accuracy of lat/lng from 1=estimated, 4=geonameid, 6=cent
and convert them into a LENGTH statement. SAS always stores numbers as 8 byte floating point numbers so the length is 8.
length
country_code $2
postal_code $20
place_name $180
admin_name1 $100
admin_code1 $20
admin_name2 $100
admin_code2 $20
admin_name3 $100
admin_code3 $20
latitude 8
longitude 8
accuracy 8
;
Then you just need to add your data , infile and input statements. You could also add LABEL statement. If you had and DATE, TIME or DATETIME values then you will need FORMAT statement and either an INFORMAT statement or add informats into the INPUT statement. Note that for normal variables there is no need for informat specifications in the INPUT statement, SAS already knows how to read in numbers and character strings.
Note that if you have defined the variables in the dataset in the same order they appear in the text file the input statement can just use a positional variable list.
data Sa_step6.Geo_GB;
infile "C:\Users\70660\Downloads\GeoNames\GB.txt" dsd dlm='09'x truncover;
length
country_code $2
postal_code $20
place_name $180
admin_name1 $100
admin_code1 $20
admin_name2 $100
admin_code2 $20
admin_name3 $100
admin_code3 $20
latitude 8
longitude 8
accuracy 8
;
input country_code -- accuracy ;
run;
Note that if your file has a header row then add the FIRSTOBS=2 option to the INFILE statement to skip that header line.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.