BookmarkSubscribeRSS Feed
alepage
Barite | Level 11

Hello,

 

I am download a file using encoding = "UFT-8" and I have found that I have emoji into the xlm file.

I need to remove only the emoji from the xml file without changing the file structure.

 

How do we do that?

Please note that accented characters need to be kept into the xml file due to French language.

 

ex;

What I have:

<QID2_TEXT>La dame a été très patience et gentille et que ça fait longtemps que je suis assurée avec vous.
Si c’était possible, je me demandais s’il n’y aurais pas possibilité de diminuer le coût de l’assurance? 🙂</QID2_TEXT>

 

What I want:

 

<QID2_TEXT>La dame a été très patience et gentille et que ça fait longtemps que je suis assurée avec vous.
Si c’était possible, je me demandais s’il n’y aurais pas possibilité de diminuer le coût de l’assurance? </QID2_TEXT>

 

 

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

You could remove all unwanted characters using Perl.

Example of the Perl syntax to process a string:

 

echo ' Cœur coût de l’assurance? 🙂</QID2_TEXT>' | perl -C -pe 's/[^[:alnum:][:space:][:punct:]]+//g'

where you only keep alphanumeric characters, spaces and punctuation symbols results in:

 

 Cœur coût de l’assurance? </QID2_TEXT>

 

1. Can't you have your SAS session using UTF-8? Your organisation should move to UTF8 to avoid this kind of headaches.

2. Note that the alnum posix expression is locale-specific

 

 

 

 

alepage
Barite | Level 11

I am loosing the apostrophe. How to keep / allow apostrophe

 

Cur coût de lassurance? </QID2_TEXT>

ChrisNZ
Tourmaline | Level 20
That's strange. That's not the case for me.
The apostrophe is part of [:punct:] so should be conserved.
https://www.regular-expressions.info/posixbrackets.html
[:punct:]
Punctuation (and symbols).
!"#$%&amp;'()*+,-./\:;&lt;=&gt;?@[]^_`{|}~
alepage
Barite | Level 11

How to apply your perl script to the xml file

 

ex: 

 

perl -C -pe 's/[^èàûéîôÇÉÇÈ"@-_<>[:ascii:][:alnum:][:space:][:punct:]]+//g' /finsys.../VirageSurvey_2.xml

ChrisNZ
Tourmaline | Level 20

The answer is easy to find if you'd just search.

perl -pe 's/[..]//g' < file.xml > file2.xml
alepage
Barite | Level 11

It works well thank you. But I still have the issue with the apostrophe that are missing.  Do you know a work around to keep apostrophe since punct does not keep it.

ChrisNZ
Tourmaline | Level 20

If you have more characters to conserve, just add them to the list.

Different characters can be used for apostrophes, beyond the single quote that punct preserves, like

 '  

 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 805 views
  • 0 likes
  • 2 in conversation