BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Yurie
Fluorite | Level 6

I have a huge CSV pipe delimited file which is 20GB with about 100 millions records. I tried to import it to SAS. However, I only could get 1 million records into SAS. I even could not open the csv file to split the big file into several small files. I will be very appreciate for any suggestions. 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Yurie

If the disk isn't full or there are some quotas set which limit the disk space available to you then it shouldn't be an issue to read the data into a SAS table.

 

You tell us you get below warning in your code.

Warning: Limit set by ERRORS = option reached. Further errors of this type will not be  printed. 

 

This Warning strongly indicates that something with your input statement doesn't work as expected (i.e. you're trying to read data in the .csv as numeric but there are characters at this position).

 

The SAS log should contain entries telling you on which line and column things go wrong.

 

SAS stops processing after a defined number of such errors as defined in the ERRORS option. 

 

For you to resolve this issue you need to amend your code. What's required depends on your actual data and if you require further guidance then you need to tell us more about your data, post your actual INFILE/INPUT code as well as the log section which gives us the details of what's going wrong.

 

 And on a side note: Should you want to split up your large source text file into chunks which you can open with a normal text editor then below SAS code could be of help.

/* create large sample text file */
filename tempout "c:\temp\testout.txt" lrecl=1000000;
data _null_;
  file tempout;
  array vars {10} $20 (10*'abcdefgh|');
  do i=1 to 20000000;
    put vars(*);
  end;
run;

/* split large sample text file into chunks */
%let lines_per_chunk=1000000;
data _null_;
  infile tempout  lrecl=1000000;
  input;
  if mod(_n_-1,&lines_per_chunk)=0 then
    do;
      n+1;
      outfile=cats("c:\temp\testout_chunk_",put(n,z6.),".txt");
    end;
  file dummy filevar=outfile lrecl=1000000;
  put _infile_;
run;

View solution in original post

7 REPLIES 7
ballardw
Super User

Which version of SAS are you running? If you are using the University Edition you may run into a limit on file size as UE is intended for learning the use of SAS and very large data sets look more like commercial use.

If you are using SAS Studio or EG connected to a server you very likely have some sort of file space limit set by your SAS Admin and you would have to work with them to increase your workspace.

 

Did you get any Warning or Error messages in the log. If so post the code with the messages.

Yurie
Fluorite | Level 6

Thank you for your response. I use SAS server in our organization. (My computer is 64-bit 16 GB. The original CSV file is saved in our shared drive which is about 3.7 TB. )  I get a warning message - Warning: Limit set by ERRORS = option reached. Further errors of this type will not be  printed. 

Sven111
Pyrite | Level 9

I wouldn't think 20GB should cause any difficulties.  You could try breaking the file up into smaller pieces to see if that makes a difference by using standard linux tools like split.

split --line-bytes=500M --additional-suffix=.csv --numeric-suffixes filename.csv
Patrick
Opal | Level 21

@Yurie

If the disk isn't full or there are some quotas set which limit the disk space available to you then it shouldn't be an issue to read the data into a SAS table.

 

You tell us you get below warning in your code.

Warning: Limit set by ERRORS = option reached. Further errors of this type will not be  printed. 

 

This Warning strongly indicates that something with your input statement doesn't work as expected (i.e. you're trying to read data in the .csv as numeric but there are characters at this position).

 

The SAS log should contain entries telling you on which line and column things go wrong.

 

SAS stops processing after a defined number of such errors as defined in the ERRORS option. 

 

For you to resolve this issue you need to amend your code. What's required depends on your actual data and if you require further guidance then you need to tell us more about your data, post your actual INFILE/INPUT code as well as the log section which gives us the details of what's going wrong.

 

 And on a side note: Should you want to split up your large source text file into chunks which you can open with a normal text editor then below SAS code could be of help.

/* create large sample text file */
filename tempout "c:\temp\testout.txt" lrecl=1000000;
data _null_;
  file tempout;
  array vars {10} $20 (10*'abcdefgh|');
  do i=1 to 20000000;
    put vars(*);
  end;
run;

/* split large sample text file into chunks */
%let lines_per_chunk=1000000;
data _null_;
  infile tempout  lrecl=1000000;
  input;
  if mod(_n_-1,&lines_per_chunk)=0 then
    do;
      n+1;
      outfile=cats("c:\temp\testout_chunk_",put(n,z6.),".txt");
    end;
  file dummy filevar=outfile lrecl=1000000;
  put _infile_;
run;
Yurie
Fluorite | Level 6

Thanks everyone spent your precious time to help me. Thank you! Your suggestions, codes, references are so helpful! Thank you and may you are blessed!  -Yurie

Kurt_Bremser
Super User

@Yurie wrote:

Thank you for your response. I use SAS server in our organization. (My computer is 64-bit 16 GB. The original CSV file is saved in our shared drive which is about 3.7 TB. )  I get a warning message - Warning: Limit set by ERRORS = option reached. Further errors of this type will not be  printed. 


You really have to READ the log (Maxim 2). Your real problem is revealed by the many ERRROR mesages you got concerning invalid data for the input statement. Some of those would be really helpful. Post them into a {i} window to preserve formatting.

ChrisNZ
Tourmaline | Level 20

The size should not be an issue.

Do you have the storage space for the data set?

Remove the warnings by reading a subset of the file first (stop after say 1,000 then 1000,000 observations) and then read the whole file when there are no warnings or errors. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3610 views
  • 1 like
  • 6 in conversation