BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I have a project where people are using WORD as a "database".

I swear I thought I briefly read a SUGI paper a while back that demo'd how to import a WORD doc as an XML but either I can't find it or I'm mistaking it for something else.

TFM has an example of importing an ACCESS, but that doesn't seem to work.

I've resorted to writing VBA macros to save the files as CSV, but I was hoping for something a little more elegant.

Does anybody know of a way to import a word-created XML document? I guess in particular, I'd need a schema and an xmltype, although I'm guessing here.

Or, even better, read a WORD doc directly (LIBNAME wdoc MSWORD 'c:\dream_on.doc')
6 REPLIES 6
Cynthia_sas
SAS Super FREQ
Hi:
It was this paper by Larry Hoyle:
http://www2.sas.com/proceedings/sugi31/019-31.pdf
entitled: "Reading Microsoft Word XML files with SAS®".

Although his example is not quite so easy as you dream of, his solution does show building an XMLMAP to read the Word XML.

The downside I see is that you'd have to be sure that their Word "database" always was saved as XML and that they ALWAYS followed the same table formats.

I don't suppose you could convince them to at least move into Excel?? Oh, hey, I have an idea. There are these great things called index cards, they're paper, see. And if you write your data on the index cards in pencil, then you can flip through the cards and review your data and even change it with this other invention called an eraser. And, you can sort them, by hand! It's so much fun!

Sorry, I couldn't resist! I sure they have a very good reason for keeping their info in Word. And the next time we're at the same user-group meeting, come and find me and I'll buy you a coffee and tell you the story about the Word Processor Student Information System!

cynthia
deleted_user
Not applicable
Thank you, Cynthia.

I just refound it a minute or two before your response and was in the process of reading and attempting to digest it.

Just to make things a little more dificult - they have two forms, developed by two different (and somewhat competitive) what were before the merger independant agencies.

Sound like your WPSIS had a common ancestor with this.

When we have that coffee, I'll tell you the tale of pdf invoices going through the email.

Oh, and btw, your index card scenario is not that removed from the actual situation. The word docs are emailed, printed, and then, and you must have seen this coming, rekeyed. It gets better, though.
Cynthia_sas
SAS Super FREQ
Not just trying to reinvent the wheel but the road it rolls on, the vehicle on top of the wheel, the fuel for the vehicle and even the infrastructure to move the vehicle from one side of the Grand Canyon to the other side (re-keying). Sigh! Well, good luck!

cynthia
deleted_user
Not applicable
Just to compound matters, I've been reading/researching the Hoyle paper. It looks like the stuff he presents is for Word 2003 XML.

Looks like the folks over at MS, are reinventing their own wheel and apparently coming up with an entirely new schema for their XML - ECMA Open Office XML.in their newest product.
Cynthia_sas
SAS Super FREQ
I've played around with Word 2007/Office 2007 and it looks like (based on a very quick look) that the spec for the 2007 XML will build on, but be different from the Office 2003 XML. This is coming from, after all, the company that wrote their own flavor of HTML. And who ALSO wrote the RTF spec so the documents could be shared across disparate Word Processing applications (ClarisWorks, Nisus Writer, AppleWorks, StarOffice, etc)

When you go to save your documents in Vista, using Office 2007, the default is to save as .DOCX, .PPTX and .XLSX -- which are apparently the XML 2007 flavors. If you want to save as "old" Office 97-2003, there is a different button to do that and then you get the "old" version file extension -- .DOC, .PPT and .XLS.

Klunky as it sounds, having a VBScript to resave the Word doc as CSV or TAB delimited doesn't sound as terrible to me as trying to reverse-engineer Microsoft XML. (But that's just my opinion.)

cynthia
deleted_user
Not applicable
We're still using Word 2003 here - probably will until 3003.

But I have some time to kill and,

I took this online course about a year or so ago from some really great instructors - I forget who at the moment (guess I'm having an Alzheimer's moment) - and I've always felt bad that there some parts of it that have gone untried. And,

We have a god-awful lot of Word documents floating around this place.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 875 views
  • 0 likes
  • 2 in conversation