06-28-2012 12:24 PM
I'm not sure what you mean by RED variable names. How did the variable names get into the WORD document? Do you have a report that you are trying to read out of WORD. You might be able to save your WORD document as a TEXT File and then read the text file with a DATA step program. Whether it is easier to read the XML file or a flat text file is up to you. Of course, all color indicators would be gone from the TEXT file. On the other hand, although the XML file (DOCUMENT.XML) does contain a color indicator, it's expressed as a color value. See the screen shot below.
In order to see "down inside" the .DOCX file, I typed some text into Word and saved the file as TEST_RED.DOCX. Then I made a copy of this file and renamed it to TEST_RED.ZIP and then I opened the ZIP archive with WinZip. That allowed me to navigate to the DOCUMENT.XML file inside the zip archive (since a .DOCX file is just a zip archive) and take this side by side screen shot. The XML is shown in the big rectangle for just the one line in the document. You can see that the string 'This is' is separated from the word 'RED' by many XML tag attributes, and these are the ones which indicate that the color is red:
XML Tag is w:color Attribute is: w:val="FF0000"
But, once you have the variable NAMES, how would you get the variable VALUES? Why do you only need the variable NAMES out of the Word doc?
06-28-2012 03:42 PM
Thank you for your reply. I manually added the variable names on a word document. The goal is to make sure the variables on the word form are matching the variable names in the sas dataset.
06-28-2012 04:12 PM
It would be far easier if you had put the variable names into an Excel file instead of a Word document. At least with an Excel file, you could use PROC IMPORT or the LIBNAME statement to read the list of names into a SAS dataset. An Excel worksheet is more like a SAS dataset and SAS would read the worksheet quite well.
The Word document is just that -- a document -- and could contain the Magna Carta, the Gettysburg Address or the full text of Lewis Carroll's poem, Jabberwocky, as well as contain a list of variable names. A document is not the same as a worksheet or a dataset, so you have set yourself a harder task. I believe there have been some previous postings about possibly copying your table to the Windows clipboard using DDE and then reading out of the clipboard, but that sounds like a learning curve, as well.
Since you typed the variable names into Word, doesn't that mean you already know what the correct variable names should be? Would a simple "low-tech" desk check suffice for your purposes?
07-03-2012 10:51 AM
SAS can easily read Word documents.
Having the variable names in red or bold won't do you any good in my solution because SAS can't tell when the format changes. I would startior end each of your variable names with a special character that can be stripped off. If that's not doable then Cynthia's XLS approach would be best I think.
Here is a macro that I use to read a series of tables in Word documents that contains record selection specifications. The document name is specified in a %let statement before the macro is called but you can easily modify it to accept a document name parameter. The documents are prepared from a template that contains pre-bookmarked tables.
07-03-2012 02:13 PM
hi ... another idea, look at https://communities.sas.com/message/124363
that thread addresses the same issue (read RTF with SAS but again without "seeing" red text) and it does it using DDE, copying all the text to the clipboard, the using the clipboard access method ..., for example ...
* use DDE to place the RTF file in the clipboard;
filename word DDE 'winword|system' notab;
put '[FileOpen .Name = "' "z:\test.rtf" '"]';
* read the file with a data step, search for text;
filename x clipbrd;
<search variable _INFILE_ for text>
see also ....
clipboard access ...
good paper ... "Importing Data from Microsoft Word into SAS"