I read the following very old paper on how to read data from word into SAS using DDE:
https://support.sas.com/resources/papers/proceedings/proceedings/sugi22/SYSARCH/PAPER308.PDF
I understand that SAS is discouraging the use of DDE, are there any more modern alternatives?
I am working in EG if it makes a difference.
Thanks.
Save the .doc as .txt, and then see if you can find something that indicates the table. I would also have to do that on an ad hoc basis.
It isn't "SAS discouraging use of DDE". It is Microsoft not supporting. Plus whatever mechanisms/memory spaces/other techmagic is used is fragile and other applications will break it. I had one piece of code that used DDE a few years back I used for one project annually. When the DDE didn't work it took me a couple of weeks to determine the issue was Cisco Jabber that had been installed since the last time I used it.
I believe if you search this forum you'll find a few other issues with DDE related to age and other applications. DDE also can require code the change depending upon which version of Microsoft Office you have installed, and finding the documents for the lasted version may be problematical because Microsoft doesn't want you to use DDE anymore.
It may help to describe exactly what you need to accomplish.
Save the document to a pure text file, and read that.
I want to read data from word tables into sas datasets. It sounds like DDE will be the easiest. Thanks for the responses. I have played around with text analytics in Viya and that imports unstructured data and parses it into parts of speech, that is not what I need. Saving the file as text would also be pretty unstructured. I need the bookmarks that DDE uses to get specific data from the word file into specific variables.
I know it's not what you're looking for, but in a data step I can always read line-by-line from a text file until I find a pattern that says "start of table", and then read the tabular data until I see the "end of table" pattern.
RTF might be helpful, as it is text with additional information that could be used to identify the table data.
A professionally done data source should always supply the raw data in usable form (CSV files or similar) along with the prettied-up report, IMO.
This sounds like a good solution, could you share the code that takes you to the start of table?
Save the .doc as .txt, and then see if you can find something that indicates the table. I would also have to do that on an ad hoc basis.
@telligent wrote:
I want to read data from word tables into sas datasets. It sounds like DDE will be the easiest. Thanks for the responses. I have played around with text analytics in Viya and that imports unstructured data and parses it into parts of speech, that is not what I need. Saving the file as text would also be pretty unstructured. I need the bookmarks that DDE uses to get specific data from the word file into specific variables.
From how many documents? How were the document's tables made? .
If it is not a large number of documents you may find copying the tables from document to spreadsheet and then getting the data from spreadsheet relatively easy (one table per sheet in a workbook). Otherwise the coding for a data step is likely to be challenging.
Or just try the DDE and see if it works. But if things go wonky the option is still available. But if this is a production process needed to work going forward then have the data provided in a different format.
And, as you are using EG, these two scenarios will prevent the use of DDE, anyway:
You could only try DDE if your EG works with the "Local" server, or a SAS workspace server on Windows that has Office installed. In which case you need to transfer the document to the server first.
Please never use DDE. I have spoken about it for years on SAS-L. It was dumped in the 90s because it is simply not very good. For Word, the file (assuming it is a modern format) is a docx. That is a zip file containing XML. Rename the extension to .zip, unzip it, and you will see the contents. Use XML engines at that point which is built into Base.
When I have to do consulting on Word automation with SAS, I read Word using C# and execute SAS via SAS/Integration Technologies. That said, it is a tad more difficult. Chris Hemendinger writes a lot in this area with EG task automation. If you think you have to go for that level, read his book and work. I have spoken about the techniques at SGF as well but my inspiration is Chris so he is best source.
C# is above my head.
Is there a programmatic way to rename a bunch of word (docx) files to .zip?
I can use winzip to zip them all into a single zip file, but I'm not sure how to save each individual file as a .zip (other than manually). Thanks.
Also looking for programmatic ways to save the word files to text in order to use the first solution.
C# isn't bad. You should try it a bit and see how it goes.
Alternatives would be something like PowerShell to automate the rename. Pretty easy to use. Any other language, such as Python, can also be used. Get the files in a dir, loop through them, renaming and saving out.
If you have your filenames in a dataset, a rename can go like this:
data _null_;
set wordfiles;
fvar = catx(" ","ren",fname,scan(fname,1,".")!!".zip","2&1");
infile pipe filevar=fvar end=done;
do until (done);
input;
put _infile_;
end;
run;
You will find all eventual responses in the SAS log.
For batch operations on documents, LibreOffice provides commandline parameters that let you automate a lot of actions, opening files and saving them to another format among them.
Any software that does not provide a sufficient commandline interface for automation is useless **** in my eyes, anyway.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!