BookmarkSubscribeRSS Feed
HG
Calcite | Level 5 HG
Calcite | Level 5

Hi All,

I have Red variable names on a word document, can I create a dataset of the RED variables?

Thanks!

9 REPLIES 9
Doc_Duke
Rhodochrosite | Level 12

If is is a docx file, it is just an XML text file and you can read it and parse out the variable names with SAS code.  Tedious, but it can be done.

Cynthia_sas
SAS Super FREQ

Hi:

  I'm not sure what you mean by RED variable names. How did the variable names get into the WORD document? Do you have a report that you are trying to read out of WORD. You might be able to save your WORD document as a TEXT File and then read the text file with a DATA step program. Whether it is easier to read the XML file or a flat text file is up to you. Of course, all color indicators would be gone from the TEXT file. On the other hand, although the XML file (DOCUMENT.XML) does contain a color indicator, it's expressed as a color value. See the screen shot below.

  In order to see "down inside" the .DOCX file, I typed some text into Word and saved the file as TEST_RED.DOCX. Then I made a copy of this file and renamed it to TEST_RED.ZIP and then I opened the ZIP archive with WinZip. That allowed me to navigate to  the DOCUMENT.XML file inside the zip archive (since a .DOCX file is just a zip archive) and take this side by side screen shot. The XML is shown in the big rectangle for just the one line in the document. You can see that the string 'This is' is separated from the word 'RED' by many XML tag attributes, and these are the ones which indicate that the color is red:

XML Tag is w:color  Attribute is: w:val="FF0000"

  But, once you have the variable NAMES, how would you get the variable VALUES? Why do you only need the variable NAMES out of the Word doc?

Just curious,

cynthia


this_is_RED.png
HG
Calcite | Level 5 HG
Calcite | Level 5

Hi Cynthia,

Thank you for your reply. I manually added the variable names on a word document. The goal is to make sure the variables on the word form are matching the variable names in the sas dataset.

Thank you!

Cynthia_sas
SAS Super FREQ

Hi:

  It would be far easier if you had put the variable names into an Excel file instead of a Word document. At least with an Excel file, you could use PROC IMPORT or the LIBNAME statement to read the list of names into a SAS dataset. An Excel worksheet is more like a SAS dataset and SAS would read the worksheet quite well.

  The Word document is just that -- a document -- and could contain the Magna Carta, the Gettysburg Address or the full text of Lewis Carroll's poem, Jabberwocky, as well as contain a list of variable names. A document is not the same as a worksheet or a dataset, so you have set yourself a harder task. I believe there have been some previous postings about possibly copying your table to the Windows clipboard using DDE and then reading out of the clipboard, but that sounds like a learning curve, as well.

  Since you typed the variable names into Word, doesn't that mean you already know what the correct variable names should be? Would a simple "low-tech" desk check suffice for your purposes?

 

cynthia

HG
Calcite | Level 5 HG
Calcite | Level 5

Hi Cynthia,

The form is a case report form.

Thanks!

bentleyj1
Quartz | Level 8

SAS can easily read Word documents.

  • First bookmark the area of the Word document that you want SAS to work with. (It takes 30-seconds with Word Help to learn about bookmarks.)  Tables work best but you can bookmark any section of the body.  The bookmarking process requires that you give each bookmark a unique name.  In effect, it becomes a dataset or table just like a sheet in an Excel workbook.
  • Then use a filename statement with the DDE engine to read the bookmarked table.  This is very similar to using a libref to read a sheet from an Excel workbook.  The fileref lets you control how the bookmarked section is read with INFILE and INPUT statements.  If the fields you want are in table format, just use define each column as a variable.  If it's in a paragraph then in your case I'd make each word the same variable and compare them one at a time to the variables in the dataset.

Having the variable names in red or bold won't do you any good in my solution because SAS can't tell when the format changes.  I would startior end each of your variable names with a special character that can be stripped off.  If that's not doable then Cynthia's XLS approach would be best I think.

Here is a macro that I use to read a series of tables in Word documents that contains record selection specifications.  The document name is specified in a %let statement before the macro is called but you can easily modify it to accept a document name parameter.  The documents are prepared from a template that contains pre-bookmarked tables.

HG
Calcite | Level 5 HG
Calcite | Level 5

Thank you for your help! I will try it.  

MikeZdeb
Rhodochrosite | Level 12

hi ... another idea, look at https://communities.sas.com/message/124363

that thread addresses the same issue (read RTF with SAS but again without "seeing" red text) and it does it using DDE, copying all the text to the clipboard, the using the clipboard access method ..., for example ...

* use DDE to place the RTF file in the clipboard;

filename word DDE 'winword|system' notab;

data _null_;

file word;

put '[FileOpen .Name = "' "z:\test.rtf" '"]';

put "[EditSelectAll]";

put "[EditCopy]";

put '[FileClose]';

run;

* read the file with a data step, search for text;

filename x clipbrd;

data _null_;

infile x;

input;

<search variable _INFILE_ for text>

see also ....

clipboard access ...


http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002571877.htm

good paper ... "Importing Data from Microsoft Word into SAS"

http://www.pharmasug.org/download/papers/CC18.pdf

bentleyj1
Quartz | Level 8

Yep, Jay Zhou's paper provided a lot of help when I built my macro.  I think I sent him a note saying 'Thanks'.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 3063 views
  • 1 like
  • 5 in conversation