Desktop productivity for business analysts and programmers

bypass data cleansing process

Accepted Solution Solved
Reply
Contributor
Posts: 72
Accepted Solution

bypass data cleansing process

Hi, I am aware that choosing "bypass data cleansing process" in the Data Import Wizard can save a quit bit of time to import data. However, I am not sure what we might miss in return. In fact, what does that "data cleansing process" do exactly?, and when should I choose the "bypass" option?

Thank you for help


Accepted Solutions
Solution
‎12-14-2015 03:08 PM
SAS Employee
Posts: 2

Re: bypass data cleansing process

The purpose of the Data Cleansing process is to ensure that the data being transferred to the server is in a state that will not cause issues during either the transfer phase or the importing phase (when the DATA step is creating the data set or table).

 

Regardless of the type of data being imported, the character values are all checked to ensure that all the characters are supported by the SAS server's current encoding (the SAS system option ENCODING=). If any characters are discovered that are not supported, then they are converted to question mark ("?") characters and a NOTE is written to the task log documenting which and how many characters have been converted. Without doing this cleansing, the transfer to the server could fail with a SAS transcoding error (which just means that the transfer process encountered a character not supported by the server's encoding).

 

If character fields in the source data file contain CR, LF or CRLF line terminators, then those are converted to spaces because those extra line terminators would confuse the DATA step (the one we generate to read the data into the data set) as to where the actual end of each line of data actually was.

 

We also attempt to ensure that all character strings are quoted correctly, in a way that the DATA step will be able to input correctly. You would be surprised at the number of different ways data exported from other software is quoted within delimited text files. Occasionally we encounter no quoting where there should be quoted values, so we make an attempt to rectify that during the cleansing process.

 

So with this process, our aim is to ensure that we end up with a data file that we can transfer to the server and that the DATA step can process without any server-side errors being returned.

 

I hope this helps clarify the data cleansing process in the Enterprise Guide Import Data task.

 

Regards,

David McNamara

EG Task Development 

View solution in original post


All Replies
Solution
‎12-14-2015 03:08 PM
SAS Employee
Posts: 2

Re: bypass data cleansing process

The purpose of the Data Cleansing process is to ensure that the data being transferred to the server is in a state that will not cause issues during either the transfer phase or the importing phase (when the DATA step is creating the data set or table).

 

Regardless of the type of data being imported, the character values are all checked to ensure that all the characters are supported by the SAS server's current encoding (the SAS system option ENCODING=). If any characters are discovered that are not supported, then they are converted to question mark ("?") characters and a NOTE is written to the task log documenting which and how many characters have been converted. Without doing this cleansing, the transfer to the server could fail with a SAS transcoding error (which just means that the transfer process encountered a character not supported by the server's encoding).

 

If character fields in the source data file contain CR, LF or CRLF line terminators, then those are converted to spaces because those extra line terminators would confuse the DATA step (the one we generate to read the data into the data set) as to where the actual end of each line of data actually was.

 

We also attempt to ensure that all character strings are quoted correctly, in a way that the DATA step will be able to input correctly. You would be surprised at the number of different ways data exported from other software is quoted within delimited text files. Occasionally we encounter no quoting where there should be quoted values, so we make an attempt to rectify that during the cleansing process.

 

So with this process, our aim is to ensure that we end up with a data file that we can transfer to the server and that the DATA step can process without any server-side errors being returned.

 

I hope this helps clarify the data cleansing process in the Enterprise Guide Import Data task.

 

Regards,

David McNamara

EG Task Development 

Contributor
Posts: 72

Re: bypass data cleansing process

Thank you so mcuh for help

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 347 views
  • 2 likes
  • 2 in conversation