BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
telligent
Quartz | Level 8

I read the following very old paper on how to read data from word into SAS using DDE:

https://support.sas.com/resources/papers/proceedings/proceedings/sugi22/SYSARCH/PAPER308.PDF

 

I understand that SAS is discouraging the use of DDE, are there any more modern alternatives?

I am working in EG if it makes a difference.  

Thanks.  

1 ACCEPTED SOLUTION
14 REPLIES 14
Reeza
Super User
Maybe.

If you have the SAS Text Analytics module then that works quite simply from what I understand (never tried). Otherwise, DOCX files are all zipped XML files so I've used that to get to the components though it can be painful. Another approach is to pair SAS & R/Python which do have packages to read from a Word Document. Depending on the docs there can also be options to convert to straight text and import....so really it depends.
ballardw
Super User

It isn't "SAS discouraging use of DDE". It is Microsoft not supporting. Plus whatever mechanisms/memory spaces/other techmagic is used is fragile and other applications will break it. I had one piece of code that used DDE a few years back I used for one project annually. When the DDE didn't work it took me a couple of weeks to determine the issue was Cisco Jabber that had been installed since the last time I used it.

 

I believe if you search this forum you'll find a few other issues with DDE related to age and other applications. DDE also can require code the change depending upon which version of Microsoft Office you have installed, and finding the documents for the lasted version may be problematical because Microsoft doesn't want you to use DDE anymore.

 

It may help to describe exactly what you need to accomplish.

 

telligent
Quartz | Level 8

I want to read data from word tables into sas datasets.  It sounds like DDE will be the easiest.  Thanks for the responses.  I have played around with text analytics in Viya and that imports unstructured data and parses it into parts of speech, that is not what I need.  Saving the file as text would also be pretty unstructured.  I need the bookmarks that DDE uses to get specific data from the word file into specific variables.   

Kurt_Bremser
Super User

I know it's not what you're looking for, but in a data step I can always read line-by-line from a text file until I find a pattern that says "start of table", and then read the tabular data until I see the "end of table" pattern.

RTF might be helpful, as it is text with additional information that could be used to identify the table data.

 

A professionally done data source should always supply the raw data in usable form (CSV files or similar) along with the prettied-up report, IMO.

telligent
Quartz | Level 8

This sounds like a good solution, could you share the code that takes you to the start of table?

 

ballardw
Super User

@telligent wrote:

I want to read data from word tables into sas datasets.  It sounds like DDE will be the easiest.  Thanks for the responses.  I have played around with text analytics in Viya and that imports unstructured data and parses it into parts of speech, that is not what I need.  Saving the file as text would also be pretty unstructured.  I need the bookmarks that DDE uses to get specific data from the word file into specific variables.   


From how many documents? How were the document's tables made? .

If it is not a large number of documents you may find copying the tables from document to spreadsheet and then getting the data from spreadsheet relatively easy (one table per sheet in a workbook). Otherwise the coding for a data step is likely to be challenging.

 

Or just try the DDE and see if it works. But if things go wonky the option is still available. But if this is a production process needed to work going forward then have the data provided in a different format.

Kurt_Bremser
Super User

And, as you are using EG, these two scenarios will prevent the use of DDE, anyway:

  • your SAS runs on a UNIX flavor. No DDE there at all
  • your SAS runs on a Windows srver, but there's no Office installed there (for good reason)

You could only try DDE if your EG works with the "Local" server, or a SAS workspace server on Windows that has Office installed. In which case you need to transfer the document to the server first.

Reeza
Super User
Is this one time or needs to be automated for many? If it's one time I would consider using Adobe to extract the tables - it's quite simple. Tabula or Office have a few ways to convert them to text automatically.

https://www.cnet.com/how-to/use-your-phone-to-turn-a-photo-into-an-excel-spreadsheet-in-a-snap/
AlanC
Barite | Level 11

Please never use DDE. I have spoken about it for years on SAS-L. It was dumped in the 90s because it is simply not very good. For Word, the file (assuming it is a modern format) is a docx. That is a zip file containing XML. Rename the extension to .zip, unzip it, and you will see the contents. Use XML engines at that point which is built into Base. 

 

When I have to do consulting on Word automation with SAS, I read Word using C# and execute SAS via SAS/Integration Technologies. That said, it is a tad more difficult. Chris Hemendinger writes a lot in this area with EG task automation. If you think you have to go for that level, read his book and work. I have spoken about the techniques at SGF as well but my inspiration is Chris so he is best source. 

https://github.com/savian-net
telligent
Quartz | Level 8

C# is above my head.

 

Is there a programmatic way to rename a bunch of word (docx) files to .zip?

I can use winzip to zip them all into a single zip file, but I'm not sure how to save each individual file as a .zip (other than manually).  Thanks. 

 

Also looking for programmatic ways to save the word files to text in order to use the first solution.   

AlanC
Barite | Level 11

C# isn't bad. You should try it a bit and see how it goes.

 

Alternatives would be something like PowerShell to automate the rename. Pretty easy to use. Any other language, such as Python, can also be used. Get the files in a dir, loop through them, renaming and saving out.

https://github.com/savian-net
Kurt_Bremser
Super User

If you have your filenames in a dataset, a  rename can go like this:

data _null_;
set wordfiles;
fvar = catx(" ","ren",fname,scan(fname,1,".")!!".zip","2&1");
infile pipe filevar=fvar end=done;
do until (done);
  input;
  put _infile_;
end;
run;

You will find all eventual responses in the SAS log.

 

For batch operations on documents, LibreOffice provides commandline parameters that let you automate a lot of actions, opening files and saving them to another format among them.

 

Any software that does not provide a sufficient commandline interface for automation is useless **** in my eyes, anyway.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Discussion stats
  • 14 replies
  • 2345 views
  • 8 likes
  • 5 in conversation