Re: Importing CSV File

Lane · Posted 04-12-2011 04:37 PM

SAS newbie here. I am trying to import a CSV file. Two columns are giving me trouble. I have county names in one column and state names in the other. Anything over 6 to 8 characters gets cut off in these columns once imported. Any idea on how I can keep SAS from cutting off my columns. I apologize if this is a problem that can be found by searching the forums, but I simply don't know what keywords to use to search for this problem.

Also, it is too big to convert to XLS. 400,000+ rows of data.

Thanks in advance

RickM · Posted 04-12-2011 04:38 PM

Try using a length statement before you input the data.

Good luck!

Lane · Posted 04-12-2011 04:47 PM

Rick, thanks for the reply. You'll have to forgive me since I'm drastically unfamiliar with SAS. Where would the length code fall within the following import code? And what length is appropriate? (I'm assuming that, say "length county $15" doesn't specify 15 characters...am I correct here?)

PROC IMPORT OUT= WORK.states
DATAFILE= "C:\Documents and Settings\dregiste\states.csv"
DBMS=csv REPLACE;
GETNAMES=YES;
RUN;

RickM · Posted 04-12-2011 05:18 PM

I made a test csv file and used your code and it worked fine with me. I usually use the datastep approach rather than proc import so I'm not sure why it isn't working for you. You could try using

PROC IMPORT OUT= WORK.states
DATAFILE= "C:\states.csv"
DBMS=csv REPLACE;
GETNAMES=YES;
GUESSINGROWS=1000;
RUN;

state.csv
------------------------------------
State,City
Massachusetts,Boston

Message was edited by: RickM Message was edited by: RickM

Lane · Posted 04-12-2011 05:31 PM

I added your guessingrows command and in the log i found a data step code with the informats and formats. I copied and pasted that and increased the number for county and state in both format and informat and that got it to work! Thanks for your help Rick!

Peter_C · Posted 05-02-2011 09:46 AM

Lane
don't use proc import,
Instead use a data step with
an infile pointing to your data,
length statements defining your columns
and finally an input statement to read them.

Then you will have control over the parsing required.

The basic idea takes longer to describe that to demo
Imagine I have a pipe-delimited file stored as c:\temp\gambol.csv holding text like
id|name|state|money|date
1|me|NV|$123,456.78|2-Apr-2011
2|two of us are better|NV|-123,456.78|3-Apr-2011

Then this is the simple data step to read that[pre]data gambling ;
infile 'c:\temp\gambol.csv' DSD dlm='|' lrecl= 10000 firstobs=2 ;
length id 8 name $30 state $2 money 8 date 6 ;
informat money dollar20. date date11. ;
format money dollar20. date date11. ;
input id -- date ;
run ;[/pre]The length statement ensures name can hold up to 30 characters.
The DSD option on the infile statement invokes CSV-type parsing. DLM= indicates the delimiter.
That informat statement is very handy as it indicates how to parse those columns which are not simple text or numbers. The format statements ensures the internal storage is converted to something more like the input when you display this data.
An INPUT statement defines the columns to read - and as SAS already knows about the columns you want to read, and learned about them in the order in which they appear on the input file, you need only define the first and last separated by the indication (--) that you want all columns defined from ID to DATE.
The syntax is probably not shorter than proc import, but is much more reliable because SAS is not required to make the judgement of how far to read through the file to decide what informat to use for each column.
However, for a quick load of unknown data, proc import might be helpful.
With 400,000 rows to read I imagine your input data are already well documented, so no need to ask proc import to make up its own definitions.

peterC

Importing CSV File