Hi everyone
I know that SAS could open an existing Excel document and run a VBA code stored in that document, no problem.
However
Could SAS create a new Excel document, insert a VBA code into a module in the document (the VBA code would be available as part of the SAS code) and then save it as .xlm and execute it?
Thank you
Well, to cover all your points. First, the right tool/process for this task is to go back to the actual data used to create those reports in the first place. This is one key area where no guidance appears to be given, there is a clear separaion between
- Outputs for human review, these can use Office and such like to show the data in a nice layout and such like, but are totally useless for anything else
- Data, this is the raw information, used for programming, databasing, analysing and such like, and is kept in a form that is not easily readable.
Now if your stuck with the first, and I can only really state again, go back to the source data, then your in all kinds of trouble, not just from physically processing it, but traceability, validation etc. So step 1, minimise what you use, if you go down the route of VBA, then do it in word, not much point in using Word and Excel, VBA is present in both. Here is an example command line parameter to extract text:
https://stackoverflow.com/questions/5671988/how-to-extract-just-plain-text-from-doc-docx-files-unix
This assumes of course it is DOCX, you are using that format correct? Format matters as the older Office formats doc and xls are binary proprietary file formats, so difficult to read or use. docx and xlsx are MS attempt to open source their file formats and these are zip files which contain XML and other components, so you can unzip them and access all the parts of the document by parsing the XML. Once you have the plain text, then its a simple import, its getting it into the text which is the real problem as Office docs can contain any number of things.
There are other command lines, I haven't used them myself so can't comment on how effective they are, for instance:
https://sourceforge.net/blog/this-command-line-utility-converts-docx-files-to-text/
You might also consider software to analyse text, e.g. https://www.sas.com/en_us/software/text-miner.html
Personally I would throw a wobbly before trying the process your talking about.
For this point: "I agree DDE is ancient and X command shouldn't be used, but as others previously commented, sometimes it is better to use the right took for the right task" - yes, use the right tool for the task, using a tech that is well over 10 years old, not supported by MS anymore, and prone to not working is never going to be the "right" tool.
Sort of, there's a paper on lexjansen.com that demonstrates this.
Whilst it may be technically possible (although you would likely have to use DDE which is ancient and shouldn't really be used), the question is why would you be doing this? Do you think there is something SAS cannot do, or do you have some VBA code already you need to use? If so then use that code in a template file and dump out your SAS data to the template file which already contains the VBA code. Trying to write VBA from SAS will just cause you lots of headaches.
Well, to cover all your points. First, the right tool/process for this task is to go back to the actual data used to create those reports in the first place. This is one key area where no guidance appears to be given, there is a clear separaion between
- Outputs for human review, these can use Office and such like to show the data in a nice layout and such like, but are totally useless for anything else
- Data, this is the raw information, used for programming, databasing, analysing and such like, and is kept in a form that is not easily readable.
Now if your stuck with the first, and I can only really state again, go back to the source data, then your in all kinds of trouble, not just from physically processing it, but traceability, validation etc. So step 1, minimise what you use, if you go down the route of VBA, then do it in word, not much point in using Word and Excel, VBA is present in both. Here is an example command line parameter to extract text:
https://stackoverflow.com/questions/5671988/how-to-extract-just-plain-text-from-doc-docx-files-unix
This assumes of course it is DOCX, you are using that format correct? Format matters as the older Office formats doc and xls are binary proprietary file formats, so difficult to read or use. docx and xlsx are MS attempt to open source their file formats and these are zip files which contain XML and other components, so you can unzip them and access all the parts of the document by parsing the XML. Once you have the plain text, then its a simple import, its getting it into the text which is the real problem as Office docs can contain any number of things.
There are other command lines, I haven't used them myself so can't comment on how effective they are, for instance:
https://sourceforge.net/blog/this-command-line-utility-converts-docx-files-to-text/
You might also consider software to analyse text, e.g. https://www.sas.com/en_us/software/text-miner.html
Personally I would throw a wobbly before trying the process your talking about.
For this point: "I agree DDE is ancient and X command shouldn't be used, but as others previously commented, sometimes it is better to use the right took for the right task" - yes, use the right tool for the task, using a tech that is well over 10 years old, not supported by MS anymore, and prone to not working is never going to be the "right" tool.
Thank you again @RW9
I really appreciate the time and effort you have put into your reply and answering this thread
I will explain things further - for your kind information- and i think you would afterwards say that I am definitely "not using the right tool for the right task"
The documents themselves have been created over many years, they were not intended for data collection, but to serve as reports for human readers, in such, there data was entered as free text. Even worse, as more than one person entered these data, there was no standardized system in writing the report, and different people described the results in different way (through there were some distinct patterns that I have been trying to exploit using Perl expressions)
The documents are stored as .DOC which is even worse. When I used SAS to extract data directly, I had to write a code that would re-save them as rtf files that would be easier to read with SAS. Even after doing this the result was a complete mess.
SO as you can see, the whole project is a bit of a mess that I am trying to salvage. One might think that a better alternative would be for a human reader to extract data manually from the results. Believe me i have been there and have done that many years ago with other projects, but the efficiency and reliability of humans significantly drops for such tasks after only doing a relatively small part of the project.
So my approach now is to combine both, to try to use SAS / other tools to extract as much as possible and then to use the human part to 1. verify output and tweak the code further 2. extract data that could not be extracted through step 1.
I guess another aspect of this is for my own learning. Through this process I learned a few things about using SAS Perl Regular expression, string manipulation, some new aspects in VBA etc....
So at the end of the day, is there a "right tool" to do what I am trying to do? The answer is definitely not, but sometimes the goal is the journey and the challenge and the learning process more than the final result (if I want to be philosophical here!)
As for using VBA in word, I havent thought about that, thanks for the suggesion, though iI would imagine that I would still need to export the data to an intermediary storage like excel or access...
Anyhow. I thank you again for your answer. Your and other people's answers clearly illustrate the important role that SAS Communities play to make life easier for people like me....
No probs. Yes, I have heard of this being done elsewhere, some have tried to make me do it as well without success. If you are using the old DOC format, then I would still recommend using VBA to dump everything out to text rather than go through Excel. But even then if each file is different its going to be a change in each case - styles and such like as you say will change. Possibly using a command line tool to dump DOCX/DOC to text would save you some time, you can get proper paid tools or free ones. Then you only have to worry about proccessing the text output.
Anyways, good luck with it
LibreOffice / OpenOffice can be run from the commandline to automatically do conversions. You could try if that can be used to do a batch conversion to raw text.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.