BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ammarhm
Lapis Lazuli | Level 10

Hi everyone

I know that SAS could open an existing Excel document and run a VBA code stored in that document, no problem.

 

However

 

Could SAS create a new Excel document, insert a VBA code into a module in the document (the VBA code would be available as part of the SAS code) and then save it as .xlm and execute it?

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, to cover all your points.  First, the right tool/process for this task is to go back to the actual data used to create those reports in the first place.  This is one key area where no guidance appears to be given, there is a clear separaion between

- Outputs for human review, these can use Office and such like to show the data in a nice layout and such like, but are totally useless for anything else

- Data, this is the raw information, used for programming, databasing, analysing and such like, and is kept in a form that is not easily readable.

 

Now if your stuck with the first, and I can only really state again, go back to the source data, then your in all kinds of trouble, not just from physically processing it, but traceability, validation etc.  So step 1, minimise what you use, if you go down the route of VBA, then do it in word, not much point in using Word and Excel, VBA is present in both.  Here is an example command line parameter to extract text:

https://stackoverflow.com/questions/5671988/how-to-extract-just-plain-text-from-doc-docx-files-unix

This assumes of course it is DOCX, you are using that format correct?  Format matters as the older Office formats doc and xls are binary proprietary file formats, so difficult to read or use.  docx and xlsx are MS attempt to open source their file formats and these are zip files which contain XML and other components, so you can unzip them and access all the parts of the document by parsing the XML.  Once you have the plain text, then its a simple import, its getting it into the text which is the real problem as Office docs can contain any number of things.

 

There are other command lines, I haven't used them myself so can't comment on how effective they are, for instance:

https://sourceforge.net/blog/this-command-line-utility-converts-docx-files-to-text/

 

You might also consider software to analyse text, e.g. https://www.sas.com/en_us/software/text-miner.html

Personally I would throw a wobbly before trying the process your talking about.

 

 

For this point: "I agree DDE is ancient and X command shouldn't be used, but as others previously commented, sometimes it is better to use the right took for the right task" - yes, use the right tool for the task, using a tech that is well over 10 years old, not supported by MS anymore, and prone to not working is never going to be the "right" tool.

View solution in original post

9 REPLIES 9
Reeza
Super User

Sort of, there's a paper on lexjansen.com that demonstrates this.

ammarhm
Lapis Lazuli | Level 10
Thank you @Reeza
Do you have the name of that paper as I have been searching in vain
Thanks again
Kind regards
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Whilst it may be technically possible (although you would likely have to use DDE which is ancient and shouldn't really be used), the question is why would you be doing this?  Do you think there is something SAS cannot do, or do you have some VBA code already you need to use?  If so then use that code in a template file and dump out your SAS data to the template file which already contains the VBA code.  Trying to write VBA from SAS will just cause you lots of headaches.

ammarhm
Lapis Lazuli | Level 10
Well, the story started with 3000 word documents that contained data in tables. I started using SAS to directly extract data from these documents. That proved to be pain stacking, slow and cumbersome. Approach B was to write a VBA in an excel file that would extract the data from the word documents into the excel sheet, which would then be imported to SAS for further analysis.
The way the solution is done now is by SAS executing the Macro stored in the excel document and then importing final results. This means I will need to maintain 2 documents, the SAS code and the excel file. I was just hoping to make it easier by maintaining one document only.
I hope this makes it easier to understand why I was looking for this solution
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, to cover all your points.  First, the right tool/process for this task is to go back to the actual data used to create those reports in the first place.  This is one key area where no guidance appears to be given, there is a clear separaion between

- Outputs for human review, these can use Office and such like to show the data in a nice layout and such like, but are totally useless for anything else

- Data, this is the raw information, used for programming, databasing, analysing and such like, and is kept in a form that is not easily readable.

 

Now if your stuck with the first, and I can only really state again, go back to the source data, then your in all kinds of trouble, not just from physically processing it, but traceability, validation etc.  So step 1, minimise what you use, if you go down the route of VBA, then do it in word, not much point in using Word and Excel, VBA is present in both.  Here is an example command line parameter to extract text:

https://stackoverflow.com/questions/5671988/how-to-extract-just-plain-text-from-doc-docx-files-unix

This assumes of course it is DOCX, you are using that format correct?  Format matters as the older Office formats doc and xls are binary proprietary file formats, so difficult to read or use.  docx and xlsx are MS attempt to open source their file formats and these are zip files which contain XML and other components, so you can unzip them and access all the parts of the document by parsing the XML.  Once you have the plain text, then its a simple import, its getting it into the text which is the real problem as Office docs can contain any number of things.

 

There are other command lines, I haven't used them myself so can't comment on how effective they are, for instance:

https://sourceforge.net/blog/this-command-line-utility-converts-docx-files-to-text/

 

You might also consider software to analyse text, e.g. https://www.sas.com/en_us/software/text-miner.html

Personally I would throw a wobbly before trying the process your talking about.

 

 

For this point: "I agree DDE is ancient and X command shouldn't be used, but as others previously commented, sometimes it is better to use the right took for the right task" - yes, use the right tool for the task, using a tech that is well over 10 years old, not supported by MS anymore, and prone to not working is never going to be the "right" tool.

ammarhm
Lapis Lazuli | Level 10

Thank you again @RW9

I really appreciate the time and effort you have put into your reply and answering this thread

 

I will explain things further - for your kind information- and i think you would afterwards say that I am definitely "not using the right tool for the right task"

The documents themselves have been created over many years, they were not intended for data collection, but to serve as reports for human readers, in such, there data was entered as free text. Even worse, as more than one person entered these data, there was no standardized system in writing the report, and different people described the results in different way (through there were some distinct patterns that I have been trying to exploit using Perl expressions)

The documents are stored as .DOC which is even worse. When I used SAS to extract data directly, I had to write a code that would re-save them as rtf files that would be easier to read with SAS. Even after doing this the result was a complete mess.

SO as you can see, the whole project is a bit of a mess that I am trying to salvage. One might think that a better alternative would be for a human reader to extract data manually from the results. Believe me i have been there and have done that many years ago with other projects, but the efficiency and reliability of humans significantly drops for such tasks after only doing a relatively small part of the project.

So my approach now is to combine both, to try to use SAS / other tools to extract as much as possible and then to use the human part to 1. verify output and tweak the code further 2. extract data that could not be extracted through step 1.

I guess another aspect of this is for my own learning. Through this process I learned a few things about using SAS Perl Regular expression, string manipulation, some new aspects in VBA etc....

So at the end of the day, is there a "right tool" to do what I am trying to do? The answer is definitely not, but sometimes the goal is the journey and the challenge and the learning process more than the final result (if I want to be philosophical here!)

As for using VBA in word, I havent thought about that, thanks for the suggesion, though iI would imagine that I would still need to export the data to an intermediary storage like excel or access...

Anyhow. I thank you again for your answer. Your and other people's answers clearly illustrate the important role that SAS Communities play to make life easier for people like me....

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

No probs.  Yes, I have heard of this being done elsewhere, some have tried to make me do it as well without success.  If you are using the old DOC format, then I would still recommend using VBA to dump everything out to text rather than go through Excel.  But even then if each file is different its going to be a change in each case - styles and such like as you say will change.  Possibly using a command line tool to dump DOCX/DOC to text would save you some time, you can get proper paid tools or free ones.  Then you only have to worry about proccessing the text output.

Anyways, good luck with it Smiley Happy

ammarhm
Lapis Lazuli | Level 10
Thank you @RW9
I agree DDE is ancient and X command shouldn't be used, but as others previously commented, sometimes it is better to use the right took for the right task, or VBA 8s better equipped (I think) to handle word documents

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1163 views
  • 3 likes
  • 4 in conversation