BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mdavidson
Quartz | Level 8

Hello all --

I've been having a tough time trying to import a huge text file, it's around 12GB and roughly 35 million rows -- my goal is to simply create an indexed SAS dataset for a data library.  The file is a CSV file, "\001" as a delimiter.  I've tried multiple methods to import (wizard, infile, etc..) but I can't seem to get anything to run.  I'm looking for some tips/suggestions on how to efficiently import this file.

Are there other tools that are better suited for the job?

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisHemedinger
Community Manager

When you say "\001" is the delimiter, do you mean 0x01, the SOH (start of heading) character?  I'm not sure the Import Data task can detect/support that.

When using the Import Data task in EG, be sure to click the Performance button on the first page, then check "Bypass the data cleansing process".  This will save quite a bit of overhead that you probably don't need.

perf.png

If you can mock up a small subset of the data file with a delimiter that EG does support, then you could use the EG task to "design" the import step, then modify a copy of the generated code to use the proper delimiter and point to the actual source file.

    INFILE 'C:\data\realbigfile.txt'
       
LRECL=32767
       
ENCODING="WLATIN1"
       
DLM='01'x
       
MISSOVER
       
DSD ;

If your EG is local and your SAS server is remote, and you need the text file to get "moved" to the server for import, consider using the Copy Files task to perform that part.  However, you'll need a good chunk of temp space to hold a 12GB file.

Chris

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.

View solution in original post

6 REPLIES 6
ChrisHemedinger
Community Manager

When you say "\001" is the delimiter, do you mean 0x01, the SOH (start of heading) character?  I'm not sure the Import Data task can detect/support that.

When using the Import Data task in EG, be sure to click the Performance button on the first page, then check "Bypass the data cleansing process".  This will save quite a bit of overhead that you probably don't need.

perf.png

If you can mock up a small subset of the data file with a delimiter that EG does support, then you could use the EG task to "design" the import step, then modify a copy of the generated code to use the proper delimiter and point to the actual source file.

    INFILE 'C:\data\realbigfile.txt'
       
LRECL=32767
       
ENCODING="WLATIN1"
       
DLM='01'x
       
MISSOVER
       
DSD ;

If your EG is local and your SAS server is remote, and you need the text file to get "moved" to the server for import, consider using the Copy Files task to perform that part.  However, you'll need a good chunk of temp space to hold a 12GB file.

Chris

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
mdavidson
Quartz | Level 8

Thanks for the response Chris.  Yes you're correct that my delimited is 0x01, the SOH (start of heading) character, sorry for the confusion.  I attempted to import a small subset of my data using the import wizard, and you're correct in that the data task does not support that type of delimiter. On the infile statement is '01'x the delimiter I want to use in this case?

ChrisHemedinger
Community Manager

Yes, that should do it.

Fun fact: when EG creates a "clean, delimited version" of a raw text file for input, it uses the DEL character ('7F'x) to minimize the chance of conflicts with actual data content.

The Performance setting I described lets you skip that step -- usually a safe option when the incoming data is a clean, well-formed source.

Chris

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
mdavidson
Quartz | Level 8

Thanks Chris, your solution worked great. Only took about 5 minutes to import.  Much appreciated.

CurtisSmithDCAA
Obsidian | Level 7

I have a similar situation. I am able to get passed the import wizard and create a file in SAS EG. But, SAS EG opens the imported data set by default and spends hours trying to load the file. I want to merely export the data set to a saved SAS data set. So, I don't need to display it open. I've tinkered with the SAS EG options related to Data, but nothing is helping.

CaseySmith
SAS Employee

In EG's Tools->Options->Results->Results General, unchecking Automatically open data or results when generated should prevent the imported data set from being opened automatically.

 

Casey


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 16358 views
  • 4 likes
  • 4 in conversation