BookmarkSubscribeRSS Feed
nflaaten
Calcite | Level 5

I am trying to import a very large text file into SAS. For various reasons, I can't just create code and run a data step. So I am attempting to read it in in pieces using the Import Wizard using "Data records starts at" and "Limit the number of records to". It is spending a lot of time examining millions of records for file attributes, then creating a cleansed copy before writing out to permanent storage. I went into the Performance options in the Wizard and set max rows for attributes to 500,000 and no cleansed copy but it is still doing them. Why is this happening and how do I fix so it doesn't take hours for each partition?

7 REPLIES 7
Kurt_Bremser
Super User

The ONLY way to fix this is to first copy the file to your SAS environment (Copy Files task) and then read it there with a data step you write, according to the documentation you got with the file.

The import wizard wastes way too much time pre-reading the file to determine the code it needs to use on the server after copying it there. And on top of that, you will probably have issues concatenating the datasets, because of differing attributes.

Tom
Super User Tom
Super User

Upload the text file to the SAS server and use SAS code instead of some Enterprise Guide tool to turn the text file into a dataset.

Even better just place the file in a location that the SAS server can read directly without bothering to involve Enterprise Guide in the transfer process.

 

If you really have no idea how to write the SAS code to read the file you could ask PROC IMPORT to try and guess for you.  But you will have a lot more control over the resulting dataset if you write the code yourself.

 

If you do have to guess how to read it you might get better performance using this macro instead of PROC IMPORT.  If for no other reason than that it will allow you to use a ZIP or GZIP file directly.  https://github.com/sasutils/macros/blob/master/csv2ds.sas

 

 

nflaaten
Calcite | Level 5

I know how to write the code. The problem is that it is almost a terabyte of data and I don't have room on the server that SAS has mounted to be able to run code against. That is why I am trying to use the Import Wizard because it does not require the mount. If the Performance Options in the Import Wizard actually did something, my work around would work. Hence my question about why they don't work. Are they just there to look pretty and confuse the issue or can I do something to make them work? And yes, I am talking with IT about setting up a mount but they work at glacial speed.

AlanC
Barite | Level 11

Write the code vs using the UI. The UI just writes the code and it is best to write a standard data step and not use proc import. Control the process. That will allow you to segment it out into parts (like 10 datasets vs 1 large one). You can always treat them as one later but the ATTRIBs matter. Write the ATTRIBs first then read it in. 

 

A dense, but good, paper on the issue is this one: Key-Independent Uniform Segmentation of Arbitrary Input Using a Hash Function (sas.com). I wasn't crazy about the title but it deals with the issue you are facing. Written by 2 of the sharpest people in the SAS world. Dense but good.

https://github.com/savian-net
Tom
Super User Tom
Super User

You are probably way beyond the use cases that the EG client-server configuration is designed to support.  It also sounds like you are outside the limits that your SAS server setup is designed to support.  Perhaps you need to consider loading that data into some external DBMS and then querying via SAS/Access to xxxx.

 

If you have the space on your PC to split the file then split it outside of EG and then try uploading the pieces.

ballardw
Super User

If you don't have room for the text file why do you think you would have room for the SAS data set created from that text file? Or am I getting confused about the limits of what you want?

 

If you have elements of the file that you don't need then a data step would let you control that much of the process and only keep the variables and records you need. I have several data sources where I need about 15 out of 200 variables in the text file. I either skip or read the unwanted columns into a single one-character variable that is dropped to reduce the size of the output data set.

 


@nflaaten wrote:

I know how to write the code. The problem is that it is almost a terabyte of data and I don't have room on the server that SAS has mounted to be able to run code against. That is why I am trying to use the Import Wizard because it does not require the mount. If the Performance Options in the Import Wizard actually did something, my work around would work. Hence my question about why they don't work. Are they just there to look pretty and confuse the issue or can I do something to make them work? And yes, I am talking with IT about setting up a mount but they work at glacial speed.


 

Kurt_Bremser
Super User

If you don't have space for the file on the server, then you won't have space for the dataset.

And if you only need a specific subset, it's best to have a mount made from the current file location to the SAS server for reading the file.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 563 views
  • 0 likes
  • 5 in conversation