DATA Step, Macro, Functions and more

Can I pull variable names on a word file using SAS?

Reply
Contributor HG
Contributor
Posts: 23

Can I pull variable names on a word file using SAS?

Hi All,

I have Red variable names on a word document, can I create a dataset of the RED variables?

Thanks!

Trusted Advisor
Posts: 2,116

Re: Can I pull variable names on a word file using SAS?

If is is a docx file, it is just an XML text file and you can read it and parse out the variable names with SAS code.  Tedious, but it can be done.

SAS Super FREQ
Posts: 8,869

Re: Can I pull variable names on a word file using SAS?

Hi:

  I'm not sure what you mean by RED variable names. How did the variable names get into the WORD document? Do you have a report that you are trying to read out of WORD. You might be able to save your WORD document as a TEXT File and then read the text file with a DATA step program. Whether it is easier to read the XML file or a flat text file is up to you. Of course, all color indicators would be gone from the TEXT file. On the other hand, although the XML file (DOCUMENT.XML) does contain a color indicator, it's expressed as a color value. See the screen shot below.

  In order to see "down inside" the .DOCX file, I typed some text into Word and saved the file as TEST_RED.DOCX. Then I made a copy of this file and renamed it to TEST_RED.ZIP and then I opened the ZIP archive with WinZip. That allowed me to navigate to  the DOCUMENT.XML file inside the zip archive (since a .DOCX file is just a zip archive) and take this side by side screen shot. The XML is shown in the big rectangle for just the one line in the document. You can see that the string 'This is' is separated from the word 'RED' by many XML tag attributes, and these are the ones which indicate that the color is red:

XML Tag is w:color  Attribute is: w:val="FF0000"

  But, once you have the variable NAMES, how would you get the variable VALUES? Why do you only need the variable NAMES out of the Word doc?

Just curious,

cynthia


this_is_RED.png
Contributor HG
Contributor
Posts: 23

Re: Can I pull variable names on a word file using SAS?

Posted in reply to Cynthia_sas

Hi Cynthia,

Thank you for your reply. I manually added the variable names on a word document. The goal is to make sure the variables on the word form are matching the variable names in the sas dataset.

Thank you!

SAS Super FREQ
Posts: 8,869

Re: Can I pull variable names on a word file using SAS?

Hi:

  It would be far easier if you had put the variable names into an Excel file instead of a Word document. At least with an Excel file, you could use PROC IMPORT or the LIBNAME statement to read the list of names into a SAS dataset. An Excel worksheet is more like a SAS dataset and SAS would read the worksheet quite well.

  The Word document is just that -- a document -- and could contain the Magna Carta, the Gettysburg Address or the full text of Lewis Carroll's poem, Jabberwocky, as well as contain a list of variable names. A document is not the same as a worksheet or a dataset, so you have set yourself a harder task. I believe there have been some previous postings about possibly copying your table to the Windows clipboard using DDE and then reading out of the clipboard, but that sounds like a learning curve, as well.

  Since you typed the variable names into Word, doesn't that mean you already know what the correct variable names should be? Would a simple "low-tech" desk check suffice for your purposes?

 

cynthia

Contributor HG
Contributor
Posts: 23

Re: Can I pull variable names on a word file using SAS?

Posted in reply to Cynthia_sas

Hi Cynthia,

The form is a case report form.

Thanks!

Contributor
Posts: 69

Re: Can I pull variable names on a word file using SAS?

SAS can easily read Word documents.

  • First bookmark the area of the Word document that you want SAS to work with. (It takes 30-seconds with Word Help to learn about bookmarks.)  Tables work best but you can bookmark any section of the body.  The bookmarking process requires that you give each bookmark a unique name.  In effect, it becomes a dataset or table just like a sheet in an Excel workbook.
  • Then use a filename statement with the DDE engine to read the bookmarked table.  This is very similar to using a libref to read a sheet from an Excel workbook.  The fileref lets you control how the bookmarked section is read with INFILE and INPUT statements.  If the fields you want are in table format, just use define each column as a variable.  If it's in a paragraph then in your case I'd make each word the same variable and compare them one at a time to the variables in the dataset.

Having the variable names in red or bold won't do you any good in my solution because SAS can't tell when the format changes.  I would startior end each of your variable names with a special character that can be stripped off.  If that's not doable then Cynthia's XLS approach would be best I think.

Here is a macro that I use to read a series of tables in Word documents that contains record selection specifications.  The document name is specified in a %let statement before the macro is called but you can easily modify it to accept a document name parameter.  The documents are prepared from a template that contains pre-bookmarked tables.

Attachment
Contributor HG
Contributor
Posts: 23

Re: Can I pull variable names on a word file using SAS?

Posted in reply to bentleyj1

Thank you for your help! I will try it.  

Valued Guide
Posts: 765

Re: Can I pull variable names on a word file using SAS?

hi ... another idea, look at https://communities.sas.com/message/124363

that thread addresses the same issue (read RTF with SAS but again without "seeing" red text) and it does it using DDE, copying all the text to the clipboard, the using the clipboard access method ..., for example ...

* use DDE to place the RTF file in the clipboard;

filename word DDE 'winword|system' notab;

data _null_;

file word;

put '[FileOpen .Name = "' "z:\test.rtf" '"]';

put "[EditSelectAll]";

put "[EditCopy]";

put '[FileClose]';

run;

* read the file with a data step, search for text;

filename x clipbrd;

data _null_;

infile x;

input;

<search variable _INFILE_ for text>

see also ....

clipboard access ...


http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002571877.htm

good paper ... "Importing Data from Microsoft Word into SAS"

http://www.pharmasug.org/download/papers/CC18.pdf

Contributor
Posts: 69

Re: Can I pull variable names on a word file using SAS?

Yep, Jay Zhou's paper provided a lot of help when I built my macro.  I think I sent him a note saying 'Thanks'.

Ask a Question
Discussion stats
  • 9 replies
  • 648 views
  • 1 like
  • 5 in conversation