BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
SanderEhmsen
Quartz | Level 8

Hi all

 

I am working on SAS 9.4 M3 (SYSVLONG4 = 9.04.01M3P06242015) and have encountered input data formatted with RTF formatting. 

It has quite a few structures as it can origin through several different channels. So there are for me to see no quick and dirty fixes.

 

My data can look something like this:

{\rtf1\fbidis\ansi\deff0{\fonttbl{\f0\fswiss\fprq2\fcharset0 Arial;}{\f1\fswiss\fprq2\fcharset0 Calibri;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1 d\ltrpar\cf1\lang1030\f0\fs22 *REAL TEXT*
d\ltrpar\sa200\sl276\slmult1\cf0\f1 *REAL TEXT*
d\ltrpar\cf1\f0
}

 

Where *REAL TEXT* indicates what I am really interested in. 

Are any of you familiar with a SAS function (eg user written) or SAS MACRO that can actually do this stripping of RTF-formating? 

 

Best,

Sander Ehmsen, Denmark.

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

You really are not going to get anywhere with this I am afraid.  There are no simple methods to parsing an rtf file into something usable.  Many have tried, I have tried, all have got some ways and given up.  

The file itself is a markup language, and there are loads of tags:

https://www.microsoft.com/en-us/download/details.aspx?id=10725

 

That is the latest spec.  Now you could write a parser, take each tag, find closing tag (if there is one), and perl may help somewhat.  But it is a big undertaking.  Even output generated from SAS which is pretty low in terms of rtf, can be very different between different systems and such like.

 

I have also looked as well at what @Kurt_Bremser mentioned, using another program to convert to another file format.  And there are ways to get html, or text output.  However even that, unless its a very simple file, really isn't much help.  Tabular output for instance - which a lot of SAS output is - doesn't have any indication of position.  RTF is literally one page at a time, cell by cell.  So you first need to parse the header blocks, then do page by page, extract the information, then set it all together.

 

I would go back to the source data, it is the best, and with limited time, the only feasible method.

View solution in original post

8 REPLIES 8
andreas_lds
Jade | Level 19

RTF is an output format, i would refuse to write a program that parses RTF.

It could be possible to remove the formatting by using some regular expressions, but i don't know enough about rtf to suggest something that actually does that job.

 

Does the interesting part always start after the first blank?

SanderEhmsen
Quartz | Level 8
Refusing is probably not an acceptable solution here. I have tried to find manually find patterns in the RTF-code like finding the first blank. I can get something like 90% right by this method. But the last 10% ends up miserably :-).
Kurt_Bremser
Super User

Depending on your environment, use VBA (Windows with MS Office) or shell scripting with OpenOffice (all open platforms) to load the rtf and save it as .txt.

SanderEhmsen
Quartz | Level 8

Thank you for your suggestion.

 

Our SAS soon runs on a Linux platform. And my Data Custodians has refused to implement a RTF-parser on that platform.

 

So according to them it is not feasible.

 

Best, 

Sander.

Kurt_Bremser
Super User

@SanderEhmsen wrote:

Thank you for your suggestion.

 

Our SAS soon runs on a Linux platform. And my Data Custodians has refused to implement a RTF-parser on that platform.

 

So according to them it is not feasible.

 

Best, 

Sander.


Tell them to look up "Mordac the Preventer".

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You really are not going to get anywhere with this I am afraid.  There are no simple methods to parsing an rtf file into something usable.  Many have tried, I have tried, all have got some ways and given up.  

The file itself is a markup language, and there are loads of tags:

https://www.microsoft.com/en-us/download/details.aspx?id=10725

 

That is the latest spec.  Now you could write a parser, take each tag, find closing tag (if there is one), and perl may help somewhat.  But it is a big undertaking.  Even output generated from SAS which is pretty low in terms of rtf, can be very different between different systems and such like.

 

I have also looked as well at what @Kurt_Bremser mentioned, using another program to convert to another file format.  And there are ways to get html, or text output.  However even that, unless its a very simple file, really isn't much help.  Tabular output for instance - which a lot of SAS output is - doesn't have any indication of position.  RTF is literally one page at a time, cell by cell.  So you first need to parse the header blocks, then do page by page, extract the information, then set it all together.

 

I would go back to the source data, it is the best, and with limited time, the only feasible method.

SanderEhmsen
Quartz | Level 8
Thank you very much for your reply.

I have contacted my data provider. And maybe they can strip it in their end.

I might do some tranwrd() and remove the most common RTF code. It will not get all the way. But it might be better for my end users.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, they must have some raw data they used to generate the RTF, so that is the best method.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 4938 views
  • 0 likes
  • 4 in conversation