SAS Office Analytics, SAS Add-In for Microsoft Office, and other integrations

import Word document to SAS

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

import Word document to SAS

[ Edited ]

Dear All,

 

I have a word document (see example attachment) that I would like to get into SAS. I would like to extract only the TOC and the table title (this is supposed to be the same but sometimes, titles are edited manually introducing differences/mistakes). The following code imports Example2.docx successfully (following Jay Zhou's paper titled "Importing Data from Microsoft Word into SAS");

OPTIONS NOXWAIT NOXSYNC;
%let rc=%sysfunc(SYSTEM(START WINWORD));

DATA _NULL_;
  X=SLEEP(2); * PAUSE SAS PROCESSING FOR 2 SECONDS WHILE WORD OPENS;
RUN;

FILENAME word DDE "WINWORD|C:\Temp\Example2.docx" NOTAB;
FILENAME word DDE 'WINWORD|SYSTEM';

DATA _NULL_;
  FILE word;
  PUT '[FileOpen.Name = "' "C:\Temp\Example2.docx" '"]';
  PUT "[EditSelectAll]";
  PUT "[EditCopy]";
  PUT '[FileClose]';
  PUT '[AppMinimize]';
RUN;

%let rc=%sysfunc(SYSTEM(START EXCEL));

DATA _NULL_;
  X=SLEEP(2); * PAUSE SAS PROCESSING FOR 2 SECONDS WHILE EXCEL OPENS AND LOADS THE SHEET;
RUN;

FILENAME EXCEL DDE 'EXCEL|SYSTEM';

DATA _NULL_;
FILE excel;
  PUT '[error(false)]';
  PUT '[paste]';
  PUT '[SAVE.AS("C:\temp\temp.xlsx")]';
  PUT '[Select.Last.Cell()]';
  PUT '[Copy]';
  PUT '[quit]';
RUN;

PROC IMPORT OUT = temp
  DATAFILE = "C:\temp\temp.xlsx"
  DBMS = EXCELCS REPLACE;
RUN;

The problem is that the number of tables and the length/size of each table is sometimes very limiting (hundreds of tables, hundreds of pages) in terms of computational time when pasting the copied text from word to Excel.

 

A work around could be to go directly from Word to SAS without intermittently invoking Excel. Something like the following;

OPTIONS NOXWAIT NOXSYNC;
%let rc=%sysfunc(SYSTEM(START WINWORD));

DATA _NULL_;
  X=SLEEP(2); 
RUN;

FILENAME word DDE "WINWORD|C:\Temp\Example2.docx" NOTAB;
FILENAME word DDE 'WINWORD|SYSTEM';

DATA worddata;
  INFILE word;
  INPUT;
RUN;

I have tried a number of things with the last data step but I don't seem to get this to work. How do I efficiently go from Word to SAS avoiding Excel as a bridge? Note that the number of tables will vary between files.

 


Accepted Solutions
Solution
3 weeks ago
Super User
Posts: 17,776

Re: import Word document to SAS

Essentially once you have the word file open, execute a SAVE AS to convert it to a text file. There are probably VBS scripts that could convert the files to text files in batch. 

 

If the files are DOCX you 'could' try to parse the XML if you're feeling adventurous. It would mean changing the extension to .zip, unzipping the file and then finding the components you need. Once you have that set up it should be relatively simple to automate it for all your files.

 

Here's a program I used to convert the DOC files to DOCX. I think you just need to find the 'number' for a text file and change the 12 to that number. It will automatically do all files in the folder. 

https://gist.github.com/statgeek/c7cab12c48133abefc2b4c84f264b290

View solution in original post


All Replies
PROC Star
Posts: 7,357

Re: import Word document to SAS

If you have IML, which by definition can run r, there's an r package for that. see: https://www.r-bloggers.com/using-r-to-get-data-out-of-word-docs/ and https://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjN1oa8-P7UA...

 

Art, CEO, AnalystFinder.com

 

Super User
Posts: 17,776

Re: import Word document to SAS

Try using DDE to save the word document as a text file instead of going to Excel and then parse the text files instead. Someone else did a similar project recently so if you search and ask him, maybe he can help out with the code he ended up using.

Super User
Super User
Posts: 7,392

Re: import Word document to SAS

As another alternative, could you not run a VBA script from within Word to export the information you need as a text file for futher processing - doesn't need to be in the same file, have an empty word do, which loads your word doc, processess it and dumps the required information out to a text file.  This would be far simpler than trying to read directly into SAS.  Note that your current code uses DDE which is very old and probably wont work on half the things you would want to.

An example:

https://stackoverflow.com/questions/3628222/extract-headings-and-pagenumber-of-table-of-contents-of-...

Occasional Contributor
Posts: 6

Re: import Word document to SAS

Many thanks for the quick suggestions.

 

For 1, I will try calling R in IML, seems interesting to me (have interacted SAS and R in the not-too-distant past to use the copula package from R). However, I would prefer to have the code fully SAS based because one always needs to configure R to interact with SAS which may be uncomfortable for other users of the macro I am writing.

 

For 2, replacing the excel with a text file via DDE should be a good solution, if I can get it implemented. I have been searching for a while on the topic but am yet to get this.

 

For 3, I have so many of these word files (my solution will be applied to probably hundreds of them). I am not familiar with VBA and have no idea how I would write that script. 

Solution
3 weeks ago
Super User
Posts: 17,776

Re: import Word document to SAS

Essentially once you have the word file open, execute a SAVE AS to convert it to a text file. There are probably VBS scripts that could convert the files to text files in batch. 

 

If the files are DOCX you 'could' try to parse the XML if you're feeling adventurous. It would mean changing the extension to .zip, unzipping the file and then finding the components you need. Once you have that set up it should be relatively simple to automate it for all your files.

 

Here's a program I used to convert the DOC files to DOCX. I think you just need to find the 'number' for a text file and change the 12 to that number. It will automatically do all files in the folder. 

https://gist.github.com/statgeek/c7cab12c48133abefc2b4c84f264b290

Super User
Posts: 17,776

Re: import Word document to SAS

Occasional Contributor
Posts: 6

Re: import Word document to SAS

Hi Reeza,

 

Indeed your script converts all .doc files in a directoy to .docx files. In order to convert to .txt files, not only do I need to change the number '12' in the script   

oWord.ActiveDocument.SaveAs oFile.path & "X", 12

to '2' but I also need to change the naming of the file because the above line only adds an "x" to the extension, so it becomes .docx.

 

How do I specify a complete new extension '.txt'?

 

Super User
Posts: 17,776

Re: import Word document to SAS


GKalema wrote:

Hi Reeza,

 

Indeed your script converts all .doc files in a directoy to .docx files. In order to convert to .txt files, not only do I need to change the number '12' in the script   

oWord.ActiveDocument.SaveAs oFile.path & "X", 12

to '2' but I also need to change the naming of the file because the above line only adds an "x" to the extension, so it becomes .docx.

 

How do I specify a complete new extension '.txt'?

 


I don't know...I would recommend checking StackOverflow or a VBS site instead of SAS at that point, since this wouldn't be considered a SAS question.

Occasional Contributor
Posts: 6

Re: import Word document to SAS

Hi Reeza,

 

Actually the following line of code saves the word file as a txt file

oWord.ActiveDocument.SaveAs oFile.path & ".txt", 2

so that if file was e.g., c:\temp\test.doc, it becomes c:\temp\test.doc.txt

 

Thanks so much for you help.

George

Super User
Posts: 17,776

Re: import Word document to SAS

I'm glad you figured it out Smiley Happy

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 396 views
  • 1 like
  • 4 in conversation