BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
almmotamedi
Obsidian | Level 7

Hi, I am aware that choosing "bypass data cleansing process" in the Data Import Wizard can save a quit bit of time to import data. However, I am not sure what we might miss in return. In fact, what does that "data cleansing process" do exactly?, and when should I choose the "bypass" option?

Thank you for help

1 ACCEPTED SOLUTION

Accepted Solutions
David_McNamara
SAS Employee

The purpose of the Data Cleansing process is to ensure that the data being transferred to the server is in a state that will not cause issues during either the transfer phase or the importing phase (when the DATA step is creating the data set or table).

 

Regardless of the type of data being imported, the character values are all checked to ensure that all the characters are supported by the SAS server's current encoding (the SAS system option ENCODING=). If any characters are discovered that are not supported, then they are converted to question mark ("?") characters and a NOTE is written to the task log documenting which and how many characters have been converted. Without doing this cleansing, the transfer to the server could fail with a SAS transcoding error (which just means that the transfer process encountered a character not supported by the server's encoding).

 

If character fields in the source data file contain CR, LF or CRLF line terminators, then those are converted to spaces because those extra line terminators would confuse the DATA step (the one we generate to read the data into the data set) as to where the actual end of each line of data actually was.

 

We also attempt to ensure that all character strings are quoted correctly, in a way that the DATA step will be able to input correctly. You would be surprised at the number of different ways data exported from other software is quoted within delimited text files. Occasionally we encounter no quoting where there should be quoted values, so we make an attempt to rectify that during the cleansing process.

 

So with this process, our aim is to ensure that we end up with a data file that we can transfer to the server and that the DATA step can process without any server-side errors being returned.

 

I hope this helps clarify the data cleansing process in the Enterprise Guide Import Data task.

 

Regards,

David McNamara

EG Task Development 

View solution in original post

2 REPLIES 2
David_McNamara
SAS Employee

The purpose of the Data Cleansing process is to ensure that the data being transferred to the server is in a state that will not cause issues during either the transfer phase or the importing phase (when the DATA step is creating the data set or table).

 

Regardless of the type of data being imported, the character values are all checked to ensure that all the characters are supported by the SAS server's current encoding (the SAS system option ENCODING=). If any characters are discovered that are not supported, then they are converted to question mark ("?") characters and a NOTE is written to the task log documenting which and how many characters have been converted. Without doing this cleansing, the transfer to the server could fail with a SAS transcoding error (which just means that the transfer process encountered a character not supported by the server's encoding).

 

If character fields in the source data file contain CR, LF or CRLF line terminators, then those are converted to spaces because those extra line terminators would confuse the DATA step (the one we generate to read the data into the data set) as to where the actual end of each line of data actually was.

 

We also attempt to ensure that all character strings are quoted correctly, in a way that the DATA step will be able to input correctly. You would be surprised at the number of different ways data exported from other software is quoted within delimited text files. Occasionally we encounter no quoting where there should be quoted values, so we make an attempt to rectify that during the cleansing process.

 

So with this process, our aim is to ensure that we end up with a data file that we can transfer to the server and that the DATA step can process without any server-side errors being returned.

 

I hope this helps clarify the data cleansing process in the Enterprise Guide Import Data task.

 

Regards,

David McNamara

EG Task Development 

almmotamedi
Obsidian | Level 7

Thank you so mcuh for help

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1462 views
  • 2 likes
  • 2 in conversation