BookmarkSubscribeRSS Feed
carolynmalone
Calcite | Level 5

I'm having trouble importing an excel CSV document into SAS 9.4. (~2 million rows, should only be 60 variables upon import, but it >250 variables) The first thousand or so lines look good, but eventually SAS is reading in multiple observation lines and putting them into one line creating hundreds of extra variables. Here is the code I've used to import it. 

 

libname CDC "filepath";

 

PROC IMPORT OUT= cdc.filename
DATAFILE= "filepath\file.csv" out=Q12018
dbms=csv replace; delimiter=',';guessingrows=max; getnames=no ; RUN;

 

 

For example instead of data looking like this:

ID        visit number      name       

0           012                 joe

1          123                   jane

2           234                  john

 

It looks like this (after several thousand rows):

ID      visit number     name     var4     var5      var6

0          012                 joe

1        123                  jane       2           234        john

 

Any help is appreciated! Thanks!

 

6 REPLIES 6
Tom
Super User Tom
Super User

CSV files have nothing to do with Excel.  Did you create the CSV file from an actual Excel file?  If so do you still have the Excel file? Can you just import that?

 

SAS will not normally combine rows into one. In fact is is more normal to see the opposite problem where SAS sees more observations that you expect because some individual values include the end of line characters.  Are you sure that the issue is not that your CSV file has poorly matched quotes and is confusing SAS?

 

Why use PROC IMPORT?  Why not just run a data step to read it?  You could start with using the data step that PROC IMPORT generates (although it does generate very ugly code).

carolynmalone
Calcite | Level 5

Thank you. No the vendor sent the file in a "Microsoft Excel Comma Separated Values File" format with a .CSV extension. It's too big to open and save as anything else.

 

I have finally been able to open the document (even though it doesn't load completely due to the number of rows) and it is a problem with his file and not the import statement - it imported properly. Thanks anyways!

 

 

Tom
Super User Tom
Super User

Note that SAS code is a very good tool for looking at files that might be too large for your normal Graphical User Interface tools.

For example you can use a simple data step to look at lines from the file.  For example if you are seeing issues after the 123,402nd observation you could run this little step to see that the 5 lines near there look like by dumping them to the SAS log.

data _null_;
  infile 'myfile.csv'  firstobs=123401 obs=123405 ;
  input ;
  list;
run;
carolynmalone
Calcite | Level 5

Thank you very much for that!

 

I also have this quarter of data split up into 4 files - hopefully it doesn't look like the other, but the extensions are ".CSV.AA", ".CSV.AB", ".CSV.AC", and ".CSV.AD" and they say "AB File" in the filetype. Does anyone know how to import this filetype or read it in? I'm going to have to import it and append them. 

Tom
Super User Tom
Super User

@carolynmalone wrote:

Thank you very much for that!

 

I also have this quarter of data split up into 4 files - hopefully it doesn't look like the other, but the extensions are ".CSV.AA", ".CSV.AB", ".CSV.AC", and ".CSV.AD" and they say "AB File" in the filetype. Does anyone know how to import this filetype or read it in? I'm going to have to import it and append them. 


File extensions are just part of the NAME of the file.  They do not necessarily have anything at all to do with what type of file it actually is.

Just look at the files with your favorite text editor and see what they contain.  You can also just run a quick SAS data step to look at the file.  For example this code will dump the first 5 lines from the file to the SAS log so you can see what they look like.

data _null_;
  infile 'filename.CSV.AA' obs=5 ;
  input;
  list;
run;

Did they split the files by observations (rows)?  If so then the first line of each probably has the same list of variable names in the same order.  Or did they split the files by variables (columns)?  If so then the files should each have the same number of lines (observations). 

Ksharp
Super User

Try termstr= option

 

 

 

FILENAME x "filepath\file.csv"  termstr=lf ;

FILENAME x "filepath\file.csv"  termstr=crlf ;

 

PROC IMPORT OUT= cdc.filename
DATAFILE=x out=Q12018
dbms=csv replace; delimiter=',';guessingrows=max; getnames=no ; RUN;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2036 views
  • 0 likes
  • 3 in conversation