04-04-2018 08:47 AM
I have to import a CSV (encoding in UTF-8) file from an HTTP address. One problem is the fact that, in the 2nd column, there is some numeric values (1->99) and at the end of this column, there is some character (A->ZZ). Theses characters are replaced by . during this importation.
How can I solve this problem ?
Please find below my importation code:
filename elect url "http://www.regardscitoyens.org/telechargement/donnees/elections/2014_municipales/municipales-2014-r%C3%A9sultats-bureaux_vote-tour1.csv" encoding="utf-8"; proc import datafile=elect out=elect replace dbms=csv ;
delimiter = ";";
GETNAMES = YES; run;
04-04-2018 09:01 AM
This is not a Comma Separated Variable file!! Aaargh. Sorry, its just every post here now states I am importing a CSV file, and then proceed to describe a file which does not comply with the definition of a comma separated variable file. You are importing a delimited file, the delimeter is a semi-colon. So first off, lets forget about CSV.
Next issue, your using proc import. This proceure is a guessing procedure, it scans some data and then tries to guess what to import the data as. It is not a good idea to use this in any production process, as the output cannot be considered consistent.
Basing on your data specification, or data transfer agreement, you can build a datastep import program which will always (i.e. is repeatable and finite) produce the same results given the same data:
data want; infile ".../...tour.dlm" dlm=";"; length ...; format ...; informat ...; input ...; run;
With this you specify what data to read in, how to read it in, based on your superb knowledge of the data in question, so you would know that column 1 is character, and thus you can write:
data want; infile ".../...tour.dlm" dlm=";"; length tour $200 ...; format tour $200 ...; informat tour $200 ...; input tour $ ...; run;
Now sure you might not want to write each one, so look at the log generated by the proc import you have, you will see it has written a datastep import for you, however its guesses are wrong based on the data, so you can copy that program, into your program, and them modify it to correctly read the data.