I have a huge CSV pipe delimited file which is 20GB with about 100 millions records. I tried to import it to SAS. However, I only could get 1 million records into SAS. I even could not open the csv file to split the big file into several small files. I will be very appreciate for any suggestions.
If the disk isn't full or there are some quotas set which limit the disk space available to you then it shouldn't be an issue to read the data into a SAS table.
You tell us you get below warning in your code.
Warning: Limit set by ERRORS = option reached. Further errors of this type will not be printed.
This Warning strongly indicates that something with your input statement doesn't work as expected (i.e. you're trying to read data in the .csv as numeric but there are characters at this position).
The SAS log should contain entries telling you on which line and column things go wrong.
SAS stops processing after a defined number of such errors as defined in the ERRORS option.
For you to resolve this issue you need to amend your code. What's required depends on your actual data and if you require further guidance then you need to tell us more about your data, post your actual INFILE/INPUT code as well as the log section which gives us the details of what's going wrong.
And on a side note: Should you want to split up your large source text file into chunks which you can open with a normal text editor then below SAS code could be of help.
/* create large sample text file */
filename tempout "c:\temp\testout.txt" lrecl=1000000;
data _null_;
file tempout;
array vars {10} $20 (10*'abcdefgh|');
do i=1 to 20000000;
put vars(*);
end;
run;
/* split large sample text file into chunks */
%let lines_per_chunk=1000000;
data _null_;
infile tempout lrecl=1000000;
input;
if mod(_n_-1,&lines_per_chunk)=0 then
do;
n+1;
outfile=cats("c:\temp\testout_chunk_",put(n,z6.),".txt");
end;
file dummy filevar=outfile lrecl=1000000;
put _infile_;
run;
Which version of SAS are you running? If you are using the University Edition you may run into a limit on file size as UE is intended for learning the use of SAS and very large data sets look more like commercial use.
If you are using SAS Studio or EG connected to a server you very likely have some sort of file space limit set by your SAS Admin and you would have to work with them to increase your workspace.
Did you get any Warning or Error messages in the log. If so post the code with the messages.
Thank you for your response. I use SAS server in our organization. (My computer is 64-bit 16 GB. The original CSV file is saved in our shared drive which is about 3.7 TB. ) I get a warning message - Warning: Limit set by ERRORS = option reached. Further errors of this type will not be printed.
I wouldn't think 20GB should cause any difficulties. You could try breaking the file up into smaller pieces to see if that makes a difference by using standard linux tools like split.
split --line-bytes=500M --additional-suffix=.csv --numeric-suffixes filename.csv
If the disk isn't full or there are some quotas set which limit the disk space available to you then it shouldn't be an issue to read the data into a SAS table.
You tell us you get below warning in your code.
Warning: Limit set by ERRORS = option reached. Further errors of this type will not be printed.
This Warning strongly indicates that something with your input statement doesn't work as expected (i.e. you're trying to read data in the .csv as numeric but there are characters at this position).
The SAS log should contain entries telling you on which line and column things go wrong.
SAS stops processing after a defined number of such errors as defined in the ERRORS option.
For you to resolve this issue you need to amend your code. What's required depends on your actual data and if you require further guidance then you need to tell us more about your data, post your actual INFILE/INPUT code as well as the log section which gives us the details of what's going wrong.
And on a side note: Should you want to split up your large source text file into chunks which you can open with a normal text editor then below SAS code could be of help.
/* create large sample text file */
filename tempout "c:\temp\testout.txt" lrecl=1000000;
data _null_;
file tempout;
array vars {10} $20 (10*'abcdefgh|');
do i=1 to 20000000;
put vars(*);
end;
run;
/* split large sample text file into chunks */
%let lines_per_chunk=1000000;
data _null_;
infile tempout lrecl=1000000;
input;
if mod(_n_-1,&lines_per_chunk)=0 then
do;
n+1;
outfile=cats("c:\temp\testout_chunk_",put(n,z6.),".txt");
end;
file dummy filevar=outfile lrecl=1000000;
put _infile_;
run;
Thanks everyone spent your precious time to help me. Thank you! Your suggestions, codes, references are so helpful! Thank you and may you are blessed! -Yurie
@Yurie wrote:
Thank you for your response. I use SAS server in our organization. (My computer is 64-bit 16 GB. The original CSV file is saved in our shared drive which is about 3.7 TB. ) I get a warning message - Warning: Limit set by ERRORS = option reached. Further errors of this type will not be printed.
You really have to READ the log (Maxim 2). Your real problem is revealed by the many ERRROR mesages you got concerning invalid data for the input statement. Some of those would be really helpful. Post them into a {i} window to preserve formatting.
The size should not be an issue.
Do you have the storage space for the data set?
Remove the warnings by reading a subset of the file first (stop after say 1,000 then 1000,000 observations) and then read the whole file when there are no warnings or errors.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.