09-10-2012 10:27 AM
I am looking to identify all the special characters( I am not sure what are the special characters) in a varaible value and suppress them dynamically?!
I am working on a very large dataset!
09-10-2012 10:31 AM
What are special characters to you?
There are options in the compress function so you can control what you keep instead of reject, that might worth be looking into.
09-10-2012 10:41 AM
Thanks for yout time. Yes. I am aware that using compress function and options, we can do the required task.
But in compress, we do need to specify the special characters such as ' - _ , . & etc.
Is there any efficient of identifying all the special characters in a variable value( as I dont know for sure the entire list) - rather than specifying manually - and suppress it?
09-10-2012 11:27 AM
No you don't need to specify each one.
Look at the modifiers in the documentation, you can specify all punctuation marks with the 'p' modifier for example and 'k' to keep all the punctuation instead of remove it.
'kn' might be what you want
09-10-2012 01:07 PM
You don't specify which platform you are working on - it makes a difference whether you are running in ASCII or EBCDIC.
We had a similar problem - I had to deconstruct the text of the source file in individual characters and then run a PROC FREQ on them to identify each unique character in the data file. From this I could select the characters I wanted to translate into other characters (blanks in this case).
If there are too many to do individually, or the file is just to big, you can try to identify the hex representation of the characters you want to keep - they usually run in a range such as '41'x through '5A'x represent uppercase A through Z. You can code to exclude any character < 'A' or > 'Z' or use the hex value. It just takes time to make sure you identify all the characters you need to keep.
ASCII and EBCDIC characters have different hex values so any hex selections you do would be different as well.