Desktop productivity for business analysts and programmers

Importing Extremely Large Text File

Accepted Solution Solved
Reply
Contributor
Posts: 53
Accepted Solution

Importing Extremely Large Text File

Hello all --

I've been having a tough time trying to import a huge text file, it's around 12GB and roughly 35 million rows -- my goal is to simply create an indexed SAS dataset for a data library.  The file is a CSV file, "\001" as a delimiter.  I've tried multiple methods to import (wizard, infile, etc..) but I can't seem to get anything to run.  I'm looking for some tips/suggestions on how to efficiently import this file.

Are there other tools that are better suited for the job?


Accepted Solutions
Solution
‎03-21-2014 12:50 PM
Community Manager
Posts: 2,693

Re: Importing Extremely Large Text File

When you say "\001" is the delimiter, do you mean 0x01, the SOH (start of heading) character?  I'm not sure the Import Data task can detect/support that.

When using the Import Data task in EG, be sure to click the Performance button on the first page, then check "Bypass the data cleansing process".  This will save quite a bit of overhead that you probably don't need.

perf.png

If you can mock up a small subset of the data file with a delimiter that EG does support, then you could use the EG task to "design" the import step, then modify a copy of the generated code to use the proper delimiter and point to the actual source file.

    INFILE 'C:\data\realbigfile.txt'
       
LRECL=32767
       
ENCODING="WLATIN1"
       
DLM='01'x
       
MISSOVER
       
DSD ;

If your EG is local and your SAS server is remote, and you need the text file to get "moved" to the server for import, consider using the Copy Files task to perform that part.  However, you'll need a good chunk of temp space to hold a 12GB file.

Chris

View solution in original post


All Replies
Solution
‎03-21-2014 12:50 PM
Community Manager
Posts: 2,693

Re: Importing Extremely Large Text File

When you say "\001" is the delimiter, do you mean 0x01, the SOH (start of heading) character?  I'm not sure the Import Data task can detect/support that.

When using the Import Data task in EG, be sure to click the Performance button on the first page, then check "Bypass the data cleansing process".  This will save quite a bit of overhead that you probably don't need.

perf.png

If you can mock up a small subset of the data file with a delimiter that EG does support, then you could use the EG task to "design" the import step, then modify a copy of the generated code to use the proper delimiter and point to the actual source file.

    INFILE 'C:\data\realbigfile.txt'
       
LRECL=32767
       
ENCODING="WLATIN1"
       
DLM='01'x
       
MISSOVER
       
DSD ;

If your EG is local and your SAS server is remote, and you need the text file to get "moved" to the server for import, consider using the Copy Files task to perform that part.  However, you'll need a good chunk of temp space to hold a 12GB file.

Chris

Contributor
Posts: 53

Re: Importing Extremely Large Text File

Thanks for the response Chris.  Yes you're correct that my delimited is 0x01, the SOH (start of heading) character, sorry for the confusion.  I attempted to import a small subset of my data using the import wizard, and you're correct in that the data task does not support that type of delimiter. On the infile statement is '01'x the delimiter I want to use in this case?

Community Manager
Posts: 2,693

Re: Importing Extremely Large Text File

Yes, that should do it.

Fun fact: when EG creates a "clean, delimited version" of a raw text file for input, it uses the DEL character ('7F'x) to minimize the chance of conflicts with actual data content.

The Performance setting I described lets you skip that step -- usually a safe option when the incoming data is a clean, well-formed source.

Chris

Contributor
Posts: 53

Re: Importing Extremely Large Text File

Thanks Chris, your solution worked great. Only took about 5 minutes to import.  Much appreciated.

Occasional Contributor
Posts: 16

Re: Importing Extremely Large Text File

I have a similar situation. I am able to get passed the import wizard and create a file in SAS EG. But, SAS EG opens the imported data set by default and spends hours trying to load the file. I want to merely export the data set to a saved SAS data set. So, I don't need to display it open. I've tinkered with the SAS EG options related to Data, but nothing is helping.

SAS Super FREQ
Posts: 271

Re: Importing Extremely Large Text File

In EG's Tools->Options->Results->Results General, unchecking Automatically open data or results when generated should prevent the imported data set from being opened automatically.

 

Casey

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 5961 views
  • 2 likes
  • 4 in conversation