Special Symbols

SASPhile · Posted 01-15-2016 02:45 PM

Hi,

How would we select all those address fields in dataset that have foriegn symbols (not the regular symbols like @,#,$) some of them like french symbols or german.

LinusH · Posted 01-15-2016 03:07 PM

Can you be more specific, what char are ok, which are not?

Data never sleeps

SASPhile · Posted 01-15-2016 04:04 PM

This is for addresses in the US:

charcters like .,-space& are allowed.

french accents not allowed.

german charcters like ö not allwoed.

ballardw · Posted 01-15-2016 04:35 PM

The easiest way would be to build a TRANSLATE statement in a data step. The fun part is getting the correct codes as your editor font may not match the font you are used to looking at, not to mention potential UNICODE or other encoding issues.

Something that might look like this:

string = translate(string,'AAAAA','ÀÁÂÃÄ');

You really want to look at the documentation for translate as it is postional replacement and the target and source strings need to match carefully. Plus the order of parameters seems backwards to most people I've discussed this with.

If the input data is straight ASCII or EBCDIC then this code will build a set with the value RANK returns for single characters and the ASCII character (defaulting to the viewer font).

data chars;
   length character $ 1;
   do i= 127 to 255;
      character =collate(i);
      output;
   end;
run;

SASPhile · Posted 01-15-2016 04:49 PM

thats really cool. But how is it matched to the dataset and see if any of the characters are present?

ballardw · Posted 01-15-2016 05:15 PM

The use would be in a very stubby bit of code.

data want;
   set have;
   addressline1 = translate(addressline1,'<targetstring>','<searchstring>');
run;

The joy of translate is that you already made the decision what would be done when they are encountered so you don't need a message unless you really want one. If were concerned you could start with a base variable and use a recoded value, then compare the two to generate messages about likely issues in the base data.

LinusH · Posted 01-16-2016 02:33 AM

Normalisation of US addresses is available within the dataflux data quality product.

Data never sleeps

Special Symbols

Re: Special Symbols

Re: Special Symbols

Re: Special Symbols

Re: Special Symbols

Re: Special Symbols

Re: Special Symbols

Special Symbols

Re: Special Symbols

Re: Special Symbols

Re: Special Symbols

Re: Special Symbols

Re: Special Symbols

Re: Special Symbols

Registration is open