How to identify all the special character in a variable value and compress them dynamically!

Reply
Contributor SGB
Contributor
Posts: 41

How to identify all the special character in a variable value and compress them dynamically!

Hi SASperts!

I am looking to identify all the special characters( I am not sure what are the special characters) in a varaible value and suppress them dynamically?!

I am working on a very large dataset!

Any inputs?!

Thanks

SGB

Super User
Posts: 17,868

Re: How to identify all the special character in a variable value and compress them dynamically!

What are special characters to you?

There are options in the compress function so you can control what you keep instead of reject, that might worth be looking into.

Contributor SGB
Contributor
Posts: 41

Re: How to identify all the special character in a variable value and compress them dynamically!

Hi Reeza

Thanks for yout time. Yes. I am aware that using compress function and options, we can do the required task.

But in compress, we do need to specify the special characters such as ' - _ , . & etc.

Is there any efficient of identifying all the special characters in a variable value( as I dont know for sure the entire list) - rather than specifying manually - and suppress it?

SGB

Super Contributor
Posts: 1,636

Re: How to identify all the special character in a variable value and compress them dynamically!

try:

var=compress(var,'','kpw');

Super User
Posts: 17,868

Re: How to identify all the special character in a variable value and compress them dynamically!

No you don't need to specify each one.

Look at the modifiers in the documentation, you can specify all punctuation marks with the 'p' modifier for example and 'k' to keep all the punctuation instead of remove it.

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm

'kn' might be what you want

Super Contributor
Posts: 358

Re: How to identify all the special character in a variable value and compress them dynamically!

You don't specify which platform you are working on - it makes a difference whether you are running in ASCII or EBCDIC.

We had a similar problem -  I had to deconstruct the text of the source file in individual characters and then run a PROC FREQ on them to identify each unique character in the data file.  From this I could select the characters I wanted to translate into other characters (blanks in this case).

If there are too many to do individually, or the file is just to big, you can try to identify the hex representation of the characters you want to keep - they usually run in a range such as '41'x through '5A'x represent uppercase A through Z.  You can code to exclude any character < 'A' or > 'Z' or use the hex value.  It just takes time to make sure you identify all the characters you need to keep.

ASCII and EBCDIC characters have different hex values so any hex selections you do would be different as well.

Ask a Question
Discussion stats
  • 5 replies
  • 654 views
  • 6 likes
  • 4 in conversation