BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dapenDaniel
Obsidian | Level 7

Hello! I have a question about importing CSV file into SAS. 

 

I have more than 300 csv file with identical variables. However, the maximum length for each variable in different file is different. Moreover, I have some extremely large records that has been beyond the length limitation of Excel. So the record is split into 2 rows. To solve these two problems, I added TERMSTR=CRLF and LRECL=32767 (for each file, the number of the observation is less than 30000). My code is following below. However, when I run the code, the log shows that 0 records were read from the infile. What should I do to solve this problems? Is it required to use a format statement to indicate the variable names, type or length?

 

I also consider to use proc import to import CSV file. I can replace lrecl = 32767 with guessingrows = 32767. How about TERMSTR=CRLF? what should I replace this code with?

 

data b04a;
infile 'F:\B04a.csv'
delimiter=","
missover
firstobs=2
TERMSTR=CRLF
lrecl = 32767;
run;

 

NOTE: The infile 'F:\B04a.csv' is:
Filename=F:\B04a.csv,
RECFM=V,LRECL=32767,File Size (bytes)=51416983,
Last Modified=22Sep2018:19:59:59,
Create Time=22Sep2018:19:59:58

NOTE: 0 records were read from the infile 'F:\B04a.csv'.
NOTE: The data set WORK.B04A has 1 observations and 0 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@dapenDaniel wrote:

Hello! I have a question about importing CSV file into SAS. 

 

I have more than 300 csv file with identical variables. However, the maximum length for each variable in different file is different. Moreover, I have some extremely large records that has been beyond the length limitation of Excel. So the record is split into 2 rows. To solve these two problems, I added TERMSTR=CRLF and LRECL=32767 (for each file, the number of the observation is less than 30000). My code is following below. However, when I run the code, the log shows that 0 records were read from the infile. What should I do to solve this problems? Is it required to use a format statement to indicate the variable names, type or length?


The problem with PROC IMPORT for delimited is that it needs to guess at what the file contains since the file does not have that metadata.  That is why you get different lengths for variables based on the particular set of records in the file it is reading.  So instead write your own data step to read the file and you can define the lengths of each variable.

 

As to the second problem you need to provide more information on what types of changes you are seeing when the values are long.  Are the lines wrapping so that instead of one line with N columns you see two lines one with X columns and another with N-X columns? Multiple lines per observation?

Or does the Excel file just have line-breaks in the middle of a cell's value?  If the Excel cells just have LF in them (which is normally how Excel inserts line breaks) then writing the lines with CR+LF at the end and using termstr=CRLF on the INFILE statement should let SAS see all of the data for one observation on one line of the input text file.

 

You are one the right track for how to read the file using a data step, but you need to add the variable definitions and an INPUT statement.  In general I find the structure below for reading multiple files with similar names and same structure to be the easiest to read and maintain.

data want ;
* Tell where to find the files. You can use a single wildcard to read multiple files;
   length fname $255 ;
   infile "F:\B*.csv" dsd lrecl=1000000 truncover filevar=fname termstr=CRLF ;
* Define variables ;
   length firstvar $20 var2 8 .... lastvar $50 ;
* Add FORMAT and/or INFORMAT definition for variables that NEED them ;
   *informat var2 date. ;
   *format var2 date9.;
* Skip header rows ;
   input @;
   if fname ne lag(fname) then input ;
* Read the actual line. Use a positional varlist if variables are defined in order.;
   input firstvar -- lastvar ;
run;   

 

 Editor's note: Also see How to read multiple text files in SAS.

View solution in original post

5 REPLIES 5
Jagadishkatam
Amethyst | Level 16

did you try the import procedure?

 

proc import datafile='F:\B04a.csv' out=want dbms=csv replace;
run;
Thanks,
Jag
dapenDaniel
Obsidian | Level 7

I tried it but this code cannot solve the two problems I talked in my questions. Thanks.

Reeza
Super User

INFILE is a file reference. You have no INPUT statement, which is what actually reads the data. However, in a data step you have to list every variable so you may be better off first using PROC IMPORT, copy the code from the log and then modifying it to read your file. Once you have that working you can use wildcards to read all of the files at once. 

 


@dapenDaniel wrote:

Hello! I have a question about importing CSV file into SAS. 

 

I have more than 300 csv file with identical variables. However, the maximum length for each variable in different file is different. Moreover, I have some extremely large records that has been beyond the length limitation of Excel. So the record is split into 2 rows. To solve these two problems, I added TERMSTR=CRLF and LRECL=32767 (for each file, the number of the observation is less than 30000). My code is following below. However, when I run the code, the log shows that 0 records were read from the infile. What should I do to solve this problems? Is it required to use a format statement to indicate the variable names, type or length?

 

I also consider to use proc import to import CSV file. I can replace lrecl = 32767 with guessingrows = 32767. How about TERMSTR=CRLF? what should I replace this code with?

 

data b04a;
infile 'F:\B04a.csv'
delimiter=","
missover
firstobs=2
TERMSTR=CRLF
lrecl = 32767;
run;

 

NOTE: The infile 'F:\B04a.csv' is:
Filename=F:\B04a.csv,
RECFM=V,LRECL=32767,File Size (bytes)=51416983,
Last Modified=22Sep2018:19:59:59,
Create Time=22Sep2018:19:59:58

NOTE: 0 records were read from the infile 'F:\B04a.csv'.
NOTE: The data set WORK.B04A has 1 observations and 0 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds


 

ballardw
Super User

@dapenDaniel wrote:

Hello! I have a question about importing CSV file into SAS. 

 

I have more than 300 csv file with identical variables. However, the maximum length for each variable in different file is different. Moreover, I have some extremely large records that has been beyond the length limitation of Excel. So the record is split into 2 rows. To solve these two problems, I added TERMSTR=CRLF and LRECL=32767 (for each file, the number of the observation is less than 30000). My code is following below. However, when I run the code, the log shows that 0 records were read from the infile. What should I do to solve this problems? Is it required to use a format statement to indicate the variable names, type or length?

 


1. You say they are "identical variables", I assume that this implies that the files have the same layout. Do you have a document anywhere that explains the limits of the variables or file layout? If not you might try to find one.

2. Be extremely cautious with opening CSV files with Excel. Saving the file from Excel may change the values of certain types of cells.

 

LRECL describes the file record length, not any specific variable.

I would use proc import one time on one of those files. Proc Import will generate data step code to read the file. Copy that from the LOG into the editor. In addition to the LRECL (which Import might set) and the TERMSTR Modify the code to

1) use nicer variables if needed (editor search and replace)

2) specify the lengths of the variables that my need more characters. Change the lengths of the INFORMAT statements generated by proc import.

3) Remove all the FORMAT statements for character variables and check if the ones set for numeric values make sense and adjust accordingly.

 

After you have one of these file read and the data step code tested then you can change the infile file name and the output data set name to read all of the files with the same characteristics.

Tom
Super User Tom
Super User

@dapenDaniel wrote:

Hello! I have a question about importing CSV file into SAS. 

 

I have more than 300 csv file with identical variables. However, the maximum length for each variable in different file is different. Moreover, I have some extremely large records that has been beyond the length limitation of Excel. So the record is split into 2 rows. To solve these two problems, I added TERMSTR=CRLF and LRECL=32767 (for each file, the number of the observation is less than 30000). My code is following below. However, when I run the code, the log shows that 0 records were read from the infile. What should I do to solve this problems? Is it required to use a format statement to indicate the variable names, type or length?


The problem with PROC IMPORT for delimited is that it needs to guess at what the file contains since the file does not have that metadata.  That is why you get different lengths for variables based on the particular set of records in the file it is reading.  So instead write your own data step to read the file and you can define the lengths of each variable.

 

As to the second problem you need to provide more information on what types of changes you are seeing when the values are long.  Are the lines wrapping so that instead of one line with N columns you see two lines one with X columns and another with N-X columns? Multiple lines per observation?

Or does the Excel file just have line-breaks in the middle of a cell's value?  If the Excel cells just have LF in them (which is normally how Excel inserts line breaks) then writing the lines with CR+LF at the end and using termstr=CRLF on the INFILE statement should let SAS see all of the data for one observation on one line of the input text file.

 

You are one the right track for how to read the file using a data step, but you need to add the variable definitions and an INPUT statement.  In general I find the structure below for reading multiple files with similar names and same structure to be the easiest to read and maintain.

data want ;
* Tell where to find the files. You can use a single wildcard to read multiple files;
   length fname $255 ;
   infile "F:\B*.csv" dsd lrecl=1000000 truncover filevar=fname termstr=CRLF ;
* Define variables ;
   length firstvar $20 var2 8 .... lastvar $50 ;
* Add FORMAT and/or INFORMAT definition for variables that NEED them ;
   *informat var2 date. ;
   *format var2 date9.;
* Skip header rows ;
   input @;
   if fname ne lag(fname) then input ;
* Read the actual line. Use a positional varlist if variables are defined in order.;
   input firstvar -- lastvar ;
run;   

 

 Editor's note: Also see How to read multiple text files in SAS.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 16752 views
  • 1 like
  • 5 in conversation