BookmarkSubscribeRSS Feed

How does "bypass data cleansing process" work in the Import Data task?

Started ‎12-11-2015 by
Modified ‎12-11-2015 by
Views 3,055

Question

Hi, I am aware that choosing "bypass data cleansing process" in the Data Import Wizard can save a quit bit of time to import data. However, I am not sure what we might miss in return. In fact, what does that "data cleansing process" do exactly?, and when should I choose the "bypass" option?

 

Answer

The purpose of the "data cleansing" process is to ensure that the data being transferred to the server is in a state that will not cause issues during either the transfer phase or the importing phase (when the DATA step is creating the data set or table).

 

Regardless of the type of data being imported, the character values are all checked to ensure that all the characters are supported by the SAS server's current encoding (the SAS system option ENCODING=). If any characters are discovered that are not supported, then they are converted to question mark ("?") characters and a NOTE is written to the task log documenting which and how many characters have been converted. Without doing this cleansing, the transfer to the server could fail with a SAS transcoding error (which just means that the transfer process encountered a character not supported by the server's encoding).

 

If character fields in the source data file contain CR, LF or CRLF line terminators, then those are converted to spaces because those extra line terminators would confuse the DATA step (the one we generate to read the data into the data set) as to where the actual end of each line of data actually was.

 

We also attempt to ensure that all character strings are quoted correctly, in a way that the DATA step will be able to input correctly. You would be surprised at the number of different ways data exported from other software is quoted within delimited text files. Occasionally we encounter no quoting where there should be quoted values, so we make an attempt to rectify that during the cleansing process.

 

So with this process, our aim is to ensure that we end up with a data file that we can transfer to the server and that the DATA step can process without any server-side errors being returned.

 

All of that said, there are costs associated with the cleansing process, and if you know that your data file is well-formed you might wish to bypass this step.  Here are the side effects you might want to avoid:

  • SAS Enterprise Guide must read the entire data file to create the "clean" version, then upload it to your SAS session, which then will use DATA step to read the file again.  This two-pass process takes longer.

  • The Import Data task will generate a program that references the temporary file that SAS Enterprise Guide creates.  If your goal is to create a SAS program that you can use again with the same or similar data file in another environment, you'll need to skip the cleansing process.  The cleansing process is performed by the SAS Enterprise Guide application and isn't represented in your SAS program.

 

To bypass the cleansing process, click Performance on the first screen for the Import Data task, then check Bypass the data cleansing process.

 

importperf.png

 

I hope this helps clarify the data cleansing process in the Enterprise Guide Import Data task.

 

Regards,

David McNamara

EG Task Development 

 

(Editor's note: for more about importing Microsoft Excel files, see How SAS Enterprise Guide imports Excel files.)

Comments

@David_McNamara

Hi Dave,

 

Thanks for the above Information.  So  if  the  Input File  is not well  formed and  i try to  bypass the Cleaning process in SAS EG Import  process then i  will end with  Errors .  Is this correct?

 

 

Thanks,

Prashant

Version history
Last update:
‎12-11-2015 12:19 PM
Updated by:

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags