BookmarkSubscribeRSS Feed
SGB
Obsidian | Level 7 SGB
Obsidian | Level 7

Hi SASperts!

I am looking to identify all the special characters( I am not sure what are the special characters) in a varaible value and suppress them dynamically?!

I am working on a very large dataset!

Any inputs?!

Thanks

SGB

6 REPLIES 6
Reeza
Super User

What are special characters to you?

There are options in the compress function so you can control what you keep instead of reject, that might worth be looking into.

SGB
Obsidian | Level 7 SGB
Obsidian | Level 7

Hi Reeza

Thanks for yout time. Yes. I am aware that using compress function and options, we can do the required task.

But in compress, we do need to specify the special characters such as ' - _ , . & etc.

Is there any efficient of identifying all the special characters in a variable value( as I dont know for sure the entire list) - rather than specifying manually - and suppress it?

SGB

Linlin
Lapis Lazuli | Level 10

try:

var=compress(var,'','kpw');

Reeza
Super User

No you don't need to specify each one.

Look at the modifiers in the documentation, you can specify all punctuation marks with the 'p' modifier for example and 'k' to keep all the punctuation instead of remove it.

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212246.htm

'kn' might be what you want

OS2Rules
Obsidian | Level 7

You don't specify which platform you are working on - it makes a difference whether you are running in ASCII or EBCDIC.

We had a similar problem -  I had to deconstruct the text of the source file in individual characters and then run a PROC FREQ on them to identify each unique character in the data file.  From this I could select the characters I wanted to translate into other characters (blanks in this case).

If there are too many to do individually, or the file is just to big, you can try to identify the hex representation of the characters you want to keep - they usually run in a range such as '41'x through '5A'x represent uppercase A through Z.  You can code to exclude any character < 'A' or > 'Z' or use the hex value.  It just takes time to make sure you identify all the characters you need to keep.

ASCII and EBCDIC characters have different hex values so any hex selections you do would be different as well.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 3428 views
  • 6 likes
  • 5 in conversation