Hi, I am aware that choosing "bypass data cleansing process" in the Data Import Wizard can save a quit bit of time to import data. However, I am not sure what we might miss in return. In fact, what does that "data cleansing process" do exactly?, and when should I choose the "bypass" option?
The purpose of the "data cleansing" process is to ensure that the data being transferred to the server is in a state that will not cause issues during either the transfer phase or the importing phase (when the DATA step is creating the data set or table).
Regardless of the type of data being imported, the character values are all checked to ensure that all the characters are supported by the SAS server's current encoding (the SAS system option ENCODING=). If any characters are discovered that are not supported, then they are converted to question mark ("?") characters and a NOTE is written to the task log documenting which and how many characters have been converted. Without doing this cleansing, the transfer to the server could fail with a SAS transcoding error (which just means that the transfer process encountered a character not supported by the server's encoding).
If character fields in the source data file contain CR, LF or CRLF line terminators, then those are converted to spaces because those extra line terminators would confuse the DATA step (the one we generate to read the data into the data set) as to where the actual end of each line of data actually was.
We also attempt to ensure that all character strings are quoted correctly, in a way that the DATA step will be able to input correctly. You would be surprised at the number of different ways data exported from other software is quoted within delimited text files. Occasionally we encounter no quoting where there should be quoted values, so we make an attempt to rectify that during the cleansing process.
So with this process, our aim is to ensure that we end up with a data file that we can transfer to the server and that the DATA step can process without any server-side errors being returned.
All of that said, there are costs associated with the cleansing process, and if you know that your data file is well-formed you might wish to bypass this step. Here are the side effects you might want to avoid:
To bypass the cleansing process, click Performance on the first screen for the Import Data task, then check Bypass the data cleansing process.
I hope this helps clarify the data cleansing process in the Enterprise Guide Import Data task.
EG Task Development
(Editor's note: for more about importing Microsoft Excel files, see How SAS Enterprise Guide imports Excel files.)
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.