How to Import an 20 GB with 100 millions records CSV pipe delimited file?

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

How to Import an 20 GB with 100 millions records CSV pipe delimited file?

I have a huge CSV pipe delimited file which is 20GB with about 100 millions records. I tried to import it to SAS. However, I only could get 1 million records into SAS. I even could not open the csv file to split the big file into several small files. I will be very appreciate for any suggestions. 


Accepted Solutions
Solution
3 weeks ago
Respected Advisor
Posts: 3,836

Re: How to Import an 20 GB with 100 millions records CSV pipe delimited file?

[ Edited ]

@Yurie

If the disk isn't full or there are some quotas set which limit the disk space available to you then it shouldn't be an issue to read the data into a SAS table.

 

You tell us you get below warning in your code.

Warning: Limit set by ERRORS = option reached. Further errors of this type will not be  printed. 

 

This Warning strongly indicates that something with your input statement doesn't work as expected (i.e. you're trying to read data in the .csv as numeric but there are characters at this position).

 

The SAS log should contain entries telling you on which line and column things go wrong.

 

SAS stops processing after a defined number of such errors as defined in the ERRORS option. 

 

For you to resolve this issue you need to amend your code. What's required depends on your actual data and if you require further guidance then you need to tell us more about your data, post your actual INFILE/INPUT code as well as the log section which gives us the details of what's going wrong.

 

 And on a side note: Should you want to split up your large source text file into chunks which you can open with a normal text editor then below SAS code could be of help.

/* create large sample text file */
filename tempout "c:\temp\testout.txt" lrecl=1000000;
data _null_;
  file tempout;
  array vars {10} $20 (10*'abcdefgh|');
  do i=1 to 20000000;
    put vars(*);
  end;
run;

/* split large sample text file into chunks */
%let lines_per_chunk=1000000;
data _null_;
  infile tempout  lrecl=1000000;
  input;
  if mod(_n_-1,&lines_per_chunk)=0 then
    do;
      n+1;
      outfile=cats("c:\temp\testout_chunk_",put(n,z6.),".txt");
    end;
  file dummy filevar=outfile lrecl=1000000;
  put _infile_;
run;

View solution in original post


All Replies
Grand Advisor
Posts: 10,211

Re: How to Import an 20 GB with 100 millions records CSV pipe delimited file?

Which version of SAS are you running? If you are using the University Edition you may run into a limit on file size as UE is intended for learning the use of SAS and very large data sets look more like commercial use.

If you are using SAS Studio or EG connected to a server you very likely have some sort of file space limit set by your SAS Admin and you would have to work with them to increase your workspace.

 

Did you get any Warning or Error messages in the log. If so post the code with the messages.

Contributor
Posts: 30

Re: How to Import an 20 GB with 100 millions records CSV pipe delimited file?

Thank you for your response. I use SAS server in our organization. (My computer is 64-bit 16 GB. The original CSV file is saved in our shared drive which is about 3.7 TB. )  I get a warning message - Warning: Limit set by ERRORS = option reached. Further errors of this type will not be  printed. 

Frequent Contributor
Posts: 81

Re: How to Import an 20 GB with 100 millions records CSV pipe delimited file?

[ Edited ]

I wouldn't think 20GB should cause any difficulties.  You could try breaking the file up into smaller pieces to see if that makes a difference by using standard linux tools like split.

split --line-bytes=500M --additional-suffix=.csv --numeric-suffixes filename.csv
Solution
3 weeks ago
Respected Advisor
Posts: 3,836

Re: How to Import an 20 GB with 100 millions records CSV pipe delimited file?

[ Edited ]

@Yurie

If the disk isn't full or there are some quotas set which limit the disk space available to you then it shouldn't be an issue to read the data into a SAS table.

 

You tell us you get below warning in your code.

Warning: Limit set by ERRORS = option reached. Further errors of this type will not be  printed. 

 

This Warning strongly indicates that something with your input statement doesn't work as expected (i.e. you're trying to read data in the .csv as numeric but there are characters at this position).

 

The SAS log should contain entries telling you on which line and column things go wrong.

 

SAS stops processing after a defined number of such errors as defined in the ERRORS option. 

 

For you to resolve this issue you need to amend your code. What's required depends on your actual data and if you require further guidance then you need to tell us more about your data, post your actual INFILE/INPUT code as well as the log section which gives us the details of what's going wrong.

 

 And on a side note: Should you want to split up your large source text file into chunks which you can open with a normal text editor then below SAS code could be of help.

/* create large sample text file */
filename tempout "c:\temp\testout.txt" lrecl=1000000;
data _null_;
  file tempout;
  array vars {10} $20 (10*'abcdefgh|');
  do i=1 to 20000000;
    put vars(*);
  end;
run;

/* split large sample text file into chunks */
%let lines_per_chunk=1000000;
data _null_;
  infile tempout  lrecl=1000000;
  input;
  if mod(_n_-1,&lines_per_chunk)=0 then
    do;
      n+1;
      outfile=cats("c:\temp\testout_chunk_",put(n,z6.),".txt");
    end;
  file dummy filevar=outfile lrecl=1000000;
  put _infile_;
run;
Contributor
Posts: 30

Re: How to Import an 20 GB with 100 millions records CSV pipe delimited file?

Thanks everyone spent your precious time to help me. Thank you! Your suggestions, codes, references are so helpful! Thank you and may you are blessed!  -Yurie

Esteemed Advisor
Posts: 6,666

Re: How to Import an 20 GB with 100 millions records CSV pipe delimited file?


Yurie wrote:

Thank you for your response. I use SAS server in our organization. (My computer is 64-bit 16 GB. The original CSV file is saved in our shared drive which is about 3.7 TB. )  I get a warning message - Warning: Limit set by ERRORS = option reached. Further errors of this type will not be  printed. 


You really have to READ the log (Maxim 2). Your real problem is revealed by the many ERRROR mesages you got concerning invalid data for the input statement. Some of those would be really helpful. Post them into a {i} window to preserve formatting.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Trusted Advisor
Posts: 1,510

Re: How to Import an 20 GB with 100 millions records CSV pipe delimited file?

The size should not be an issue.

Do you have the storage space for the data set?

Remove the warnings by reading a subset of the file first (stop after say 1,000 then 1000,000 observations) and then read the whole file when there are no warnings or errors. 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 154 views
  • 0 likes
  • 6 in conversation