DATA Step, Macro, Functions and more

How to convert the rtf text/symbols to plain text?

Reply
Contributor
Posts: 32

How to convert the rtf text/symbols to plain text?

Hello, everyone

 

I have four variables start with {rtf1\.... symbols/text that I need to find some key words from there to generate a report. The contents include such as:

 

"{\rtf1\ansi\deff0\deftab720{\fonttbl{\f0\fswiss MS Sans Serif;}{\f1\froman\fcharset2 Symbol;}{\f2\froman Times New Roman;}{\f3\froman\fprq2 Times New Roman;}{\f4\fswiss MS Shell Dlg;}{\f5\froman Times New Roman;}{\f6\fswiss\fprq2 System;}}

{\colortbl\red0\green0\blue0;\red255\green0\blue0;}

\deflang1033\pard\plain\f5\fs20

}"

 

"{\rtf1\ansi\ansicpg1252\deff0\deftab720{\fonttbl{\f0\fswiss MS Sans Serif;}{\f1\froman\fcharset2 Symbol;}{\f2\froman Times New Roman;}{\f3\froman Times New Roman;}{\f4\fswiss\fprq2 System;}{\f5\froman\fprq2 Times New Roman;}}

{\colortbl\red0\green0\blue0;}

\deflang1033\pard\plain\f3\fs20

}"

 

"{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Times New Roman;}}

\viewkind4\uc1\pard\f0\fs20\par

}"

 

I do not know what are these symbols/text mean. I need to convert these text/symbols to plain text, so that I can search for the key words that I need.  Any suggestions/hints will be very appreciated! Thank you. 

 

Respected Advisor
Posts: 4,173

Re: How to convert the rtf text/symbols to plain text?

Looks like you've read in some Word RTF document. How did you get there in first place?

 

I would try to first extract the text only with non-SAS tools and only then use SAS for further processing. How to do this depends on your environment.

 

You could for example use a VB script for extracting the text or also Tika does a really great job. https://tika.apache.org/download.html 

 

Contributor
Posts: 32

Re: How to convert the rtf text/symbols to plain text?

Hello, Patrick

 

I just use SAS with ODBC connecttion to get the data (it's Oracle database). My connection code showing below:

 

libname
exports 
Oracle
path =  XXX
dbprompt = no
uid=&username.
Password=&pswd.
schema = XXX
;

 

I tried to connect the data with excel, access and I got all the same text messages. I will check the link that you provided here soon. Thank you! 

Respected Advisor
Posts: 4,173

Re: How to convert the rtf text/symbols to plain text?

Oh... I see. So that's stored in a CLOB in Oracle. That's gonna be tricky.

 

I've never been in your situation so can't speak out of experience. Just throwing some thoughts:

- Everything I've proposed in my last post assumed that you have direct access to the RTF document as a file; but that's not the case

- You would need to read the CLOB into multiple rows in SAS as a SAS variable can only hold 32KB. It's possible to do but needs some extra coding.

- There must be a reason that someone stores the RTF's in Oracle. If you're just after something like number of hits for a search term then may be there is Oracle Text available and you could run your queries in-database and then just get the result back. I've never used Oracle Text so not sure how and if this could be called out of a remote SAS process.

 

What I would try first:

Make things work directly in-database (using SQL developer; using Oracle Text). Only once things work try and call it out of a SAS session.

Ask a Question
Discussion stats
  • 3 replies
  • 321 views
  • 0 likes
  • 2 in conversation