I am trying to import a dataset that has quite a few special characters. When I used EG 4, it was very helpful in cleaning those special characters and telling me:
/* NOTE: The data that was transferred to the SAS server has been 'cleansed' */
/* prior to the transfer so as to reduce the chance of the SAS server */
/* encountering characters not recognized within the constraints of */
/* the server's current locale. */
/* This is an automatic process that has been implemented to allow the */
/* data to be transferred to the server for processing. Without */
/* performing this operation the task would fail and the data would */
/* have been unable to be imported. */
/* The following characters were translated: */
/* Right Single Quotation Mark --> ' 68 times */
/* Left Double Quotation Mark --> " 14 times */
/* Right Double Quotation Mark --> " 14 times */
/* En Dash --> - 20 times */
/* Yen Sign --> Y 15 times */
/* Vulgar Fraction Three Quarters --> 1 times */
Unfortunately, I am trying to run the import with my code instead of in EG and I can't find any calls to this cleansing process. Is there a way I can call this cleanse function in my code before I import rather than use the EG import?
The "cleansing" process is not part of a SAS program, but happens within EG processing on the client side of things. It works this way because if it tried to transfer the text file to the server for processing first, it's already too late. A text file whose encoding doesn't match the server's session encoding could already cause errors.
If you text file is already on the server and you want to "clean" it, you would have to write a DATA step to read in the text content and emit lines that don't contain the offending characters.