SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Special Symbols

Reply
Super Contributor
Posts: 673

Special Symbols

Hi,

  How would we select all those address fields in dataset that have foriegn symbols (not the regular symbols like @,#,$) some of them like french symbols or german.

 

Super User
Posts: 5,437

Re: Special Symbols

Can you be more specific, what char are ok, which are not?

Data never sleeps
Super Contributor
Posts: 673

Re: Special Symbols

This is for addresses in the US:

charcters like .,-space& are allowed.

french accents not allowed.

german charcters like ö not allwoed.

Super User
Posts: 11,343

Re: Special Symbols

The easiest way would be to build a TRANSLATE statement in a data step. The fun part is getting the correct codes as your editor font may not match the font you are used to looking at, not to mention potential UNICODE or other encoding issues.

Something that might look like this:

string = translate(string,'AAAAA','ÀÁÂÃÄ');

You really want to look at the documentation for translate as it is postional replacement and the target and source strings need to match carefully. Plus the order of parameters seems backwards to most people I've discussed this with.

 

If the input data is straight ASCII or EBCDIC then this code will build a set with the value RANK returns for single characters and the ASCII character (defaulting to the viewer font).

data chars;
   length character $ 1;
   do i= 127 to 255;
      character =collate(i);
      output;
   end;
run; 
Super Contributor
Posts: 673

Re: Special Symbols

thats really cool. But how is it matched to the dataset and see if any of the characters are present?
Super User
Posts: 11,343

Re: Special Symbols

The use would be in a very stubby bit of code.

data want;
   set have;
   addressline1 = translate(addressline1,'<targetstring>','<searchstring>');
run;

The joy of translate is that you already made the decision what would be done when they are encountered so you don't need a message unless you really want one. If were concerned you could start with a base variable and use a recoded value, then compare the two to generate messages about likely issues in the base data.

 

Super User
Posts: 5,437

Re: Special Symbols

Normalisation of US addresses is available within the dataflux data quality product.
Data never sleeps
Ask a Question
Discussion stats
  • 6 replies
  • 393 views
  • 0 likes
  • 3 in conversation