We’re smarter together. Learn from this collection of community knowledge and add your expertise.

How does "bypass data cleansing process" work in the Import Data task?

by SAS Employee David_McNamara on ‎12-11-2015 12:19 PM (1,314 Views)

Question

Hi, I am aware that choosing "bypass data cleansing process" in the Data Import Wizard can save a quit bit of time to import data. However, I am not sure what we might miss in return. In fact, what does that "data cleansing process" do exactly?, and when should I choose the "bypass" option?

 

Answer

The purpose of the "data cleansing" process is to ensure that the data being transferred to the server is in a state that will not cause issues during either the transfer phase or the importing phase (when the DATA step is creating the data set or table).

 

Regardless of the type of data being imported, the character values are all checked to ensure that all the characters are supported by the SAS server's current encoding (the SAS system option ENCODING=). If any characters are discovered that are not supported, then they are converted to question mark ("?") characters and a NOTE is written to the task log documenting which and how many characters have been converted. Without doing this cleansing, the transfer to the server could fail with a SAS transcoding error (which just means that the transfer process encountered a character not supported by the server's encoding).

 

If character fields in the source data file contain CR, LF or CRLF line terminators, then those are converted to spaces because those extra line terminators would confuse the DATA step (the one we generate to read the data into the data set) as to where the actual end of each line of data actually was.

 

We also attempt to ensure that all character strings are quoted correctly, in a way that the DATA step will be able to input correctly. You would be surprised at the number of different ways data exported from other software is quoted within delimited text files. Occasionally we encounter no quoting where there should be quoted values, so we make an attempt to rectify that during the cleansing process.

 

So with this process, our aim is to ensure that we end up with a data file that we can transfer to the server and that the DATA step can process without any server-side errors being returned.

 

All of that said, there are costs associated with the cleansing process, and if you know that your data file is well-formed you might wish to bypass this step.  Here are the side effects you might want to avoid:

  • SAS Enterprise Guide must read the entire data file to create the "clean" version, then upload it to your SAS session, which then will use DATA step to read the file again.  This two-pass process takes longer.

  • The Import Data task will generate a program that references the temporary file that SAS Enterprise Guide creates.  If your goal is to create a SAS program that you can use again with the same or similar data file in another environment, you'll need to skip the cleansing process.  The cleansing process is performed by the SAS Enterprise Guide application and isn't represented in your SAS program.

 

To bypass the cleansing process, click Performance on the first screen for the Import Data task, then check Bypass the data cleansing process.

 

importperf.png

 

I hope this helps clarify the data cleansing process in the Enterprise Guide Import Data task.

 

Regards,

David McNamara

EG Task Development 

 

(Editor's note: for more about importing Microsoft Excel files, see How SAS Enterprise Guide imports Excel files.)