Hi All,
I have Red variable names on a word document, can I create a dataset of the RED variables?
Thanks!
If is is a docx file, it is just an XML text file and you can read it and parse out the variable names with SAS code. Tedious, but it can be done.
Hi:
I'm not sure what you mean by RED variable names. How did the variable names get into the WORD document? Do you have a report that you are trying to read out of WORD. You might be able to save your WORD document as a TEXT File and then read the text file with a DATA step program. Whether it is easier to read the XML file or a flat text file is up to you. Of course, all color indicators would be gone from the TEXT file. On the other hand, although the XML file (DOCUMENT.XML) does contain a color indicator, it's expressed as a color value. See the screen shot below.
In order to see "down inside" the .DOCX file, I typed some text into Word and saved the file as TEST_RED.DOCX. Then I made a copy of this file and renamed it to TEST_RED.ZIP and then I opened the ZIP archive with WinZip. That allowed me to navigate to the DOCUMENT.XML file inside the zip archive (since a .DOCX file is just a zip archive) and take this side by side screen shot. The XML is shown in the big rectangle for just the one line in the document. You can see that the string 'This is' is separated from the word 'RED' by many XML tag attributes, and these are the ones which indicate that the color is red:
XML Tag is w:color Attribute is: w:val="FF0000"
But, once you have the variable NAMES, how would you get the variable VALUES? Why do you only need the variable NAMES out of the Word doc?
Just curious,
cynthia
Hi Cynthia,
Thank you for your reply. I manually added the variable names on a word document. The goal is to make sure the variables on the word form are matching the variable names in the sas dataset.
Thank you!
Hi:
It would be far easier if you had put the variable names into an Excel file instead of a Word document. At least with an Excel file, you could use PROC IMPORT or the LIBNAME statement to read the list of names into a SAS dataset. An Excel worksheet is more like a SAS dataset and SAS would read the worksheet quite well.
The Word document is just that -- a document -- and could contain the Magna Carta, the Gettysburg Address or the full text of Lewis Carroll's poem, Jabberwocky, as well as contain a list of variable names. A document is not the same as a worksheet or a dataset, so you have set yourself a harder task. I believe there have been some previous postings about possibly copying your table to the Windows clipboard using DDE and then reading out of the clipboard, but that sounds like a learning curve, as well.
Since you typed the variable names into Word, doesn't that mean you already know what the correct variable names should be? Would a simple "low-tech" desk check suffice for your purposes?
cynthia
Hi Cynthia,
The form is a case report form.
Thanks!
SAS can easily read Word documents.
Having the variable names in red or bold won't do you any good in my solution because SAS can't tell when the format changes. I would startior end each of your variable names with a special character that can be stripped off. If that's not doable then Cynthia's XLS approach would be best I think.
Here is a macro that I use to read a series of tables in Word documents that contains record selection specifications. The document name is specified in a %let statement before the macro is called but you can easily modify it to accept a document name parameter. The documents are prepared from a template that contains pre-bookmarked tables.
Thank you for your help! I will try it.
hi ... another idea, look at https://communities.sas.com/message/124363
that thread addresses the same issue (read RTF with SAS but again without "seeing" red text) and it does it using DDE, copying all the text to the clipboard, the using the clipboard access method ..., for example ...
* use DDE to place the RTF file in the clipboard;
filename word DDE 'winword|system' notab;
data _null_;
file word;
put '[FileOpen .Name = "' "z:\test.rtf" '"]';
put "[EditSelectAll]";
put "[EditCopy]";
put '[FileClose]';
run;
* read the file with a data step, search for text;
filename x clipbrd;
data _null_;
infile x;
input;
<search variable _INFILE_ for text>
see also ....
clipboard access ...
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002571877.htm
good paper ... "Importing Data from Microsoft Word into SAS"
Yep, Jay Zhou's paper provided a lot of help when I built my macro. I think I sent him a note saying 'Thanks'.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.