The SAS Output Delivery System and reporting techniques

MS Word to SAS Help

Reply
N/A
Posts: 0

MS Word to SAS Help

Not sure if this is the correct forum - but does anyone have any insight or papers on how to import MS Word documents into SAS?

Thanks!
SAS Super FREQ
Posts: 8,743

Re: MS Word to SAS Help

Hi:
This SUGI paper http://www2.sas.com/proceedings/sugi31/019-31.pdf, entitled "Reading Microsoft Word XML Files With SAS" outlines a method by which you could read Word XML files and then parse out information or tables and turn the information into a SAS dataset. Microsoft Word XML files are NOT the same kind of file as a .DOC file -- so first, in order to follow what the paper presents, you would have to resave your Word doc as an XML file.

I guess I'm not really clear on what you want/need to do. To me, there's a HUGE difference between having a Word doc that is all tables and having a Word doc that is the great American novel.

I can easily imagine having a Word doc with a single data table where I want that data table turned into a SAS dataset. But what if my Word doc has multiple tables...does each table become a separate dataset? Do all the separate tables constitute a single dataset? What if the table has more than one header column, what column should be used for the column names? That all seems somewhat problematic to me. If you're going to end up resaving your Word document in XML form, you might just as well cut and paste your Word tables into Excel and then use SAS to import from Excel directly into SAS dataset form.

It is harder to imagine trying to import the great American novel...or even a minor memo into SAS dataset format. If you have a true "document" then you might want to investigate using a product like Text Miner (http://support.sas.com/software/91x/tmwhatsnew31.htm) to find all the references to "wombats" in the document. Or scan a bunch of documents searching for "not happy" and "unhappy" or "wonderful" and "outstanding".

The bottom line is that because Microsoft has never published the Word .DOC binary format, it would be extremely hard to read a "true" .doc file with SAS -- whether you have a data table or a memo that you want to read into SAS format.

Tech Support is probably your best bet for figuring out how to accomplish what you need to do. Go to this site to find out how to contact Tech Support:
http://support.sas.com/techsup/contact/index.html

Good luck!
cynthia

ps...but I must admit, I am truly curious about what kind of information or document you have in Word that you want to import into SAS.
N/A
Posts: 0

Re: MS Word to SAS Help

Hey Cynthia,

Sorry I wasn't more specific, the .doc would be taking a survey document:

1. What is your favorite color?

a. Blue
b. Green
c. Yellow

2. Favorite dog?

etc, etc

Taking that information and creating one column with question, the next with response.

We have built a SAS survey system, but the one of the main time consuming steps is putting the submitted Word document into an excel doc and then importing into a SAS dataset.

I will try the xml and see how it works.

Thanks as always for your help.

Brad
SAS Super FREQ
Posts: 8,743

Re: MS Word to SAS Help

Oh! Of course, I hadn't thought about surveys. When I worked with surveys, almost all of our data entry folks got downsized and retrained because of those scannable/OCR forms. That's an interesting problem.

If you could have them DELETE every choice except their choice from the Word doc and THEN resave the Word doc as a .TXT file, you could probably just read the file with a DATA step program.

But XML is probably a better choice. I don't know how you'd figure out which of the survey items had been selected, unless you read the Microsoft documentation on the XML they use.

Anyway, thanks! & good luck with this!
cynthia
Ask a Question
Discussion stats
  • 3 replies
  • 157 views
  • 0 likes
  • 2 in conversation