07-13-2017 07:25 AM
I need help from you guys. I want to get the text of word document into SAS datasets by converting the headings in the documents as variables for the SAS datasets.
Example of the Word document:
#2-3-4-5, 2nd cross,
1st Main, NY
Output I need:
obs Name Sex Age Address
01 Jhon Male 25 Years #2-3-4-5, 2nd cross,
1st Main, NY
Can any one help me to find a solution for this?????
07-13-2017 07:31 AM
From Word, File->Save As-> save the file as .txt. Then write a datastep to read the text file and output to your given requirements:
data want; length buff name sex address $2000; infile "thetextfile.txt"; input buff $; if buff="Name:" then input name $; ... run;
The real question is why are you using an output for human review file format such as Word as data. Return to the source data and go from there, thats really the only "good" way.
07-13-2017 08:05 AM
Thank you for the reply. In the programme you provided i need to specify the variables manually, my actual problem is that i am looking for macro which can extract the headings or bookmarks as variables of SAS dataset.
07-13-2017 08:23 AM
Since nothing in the Word document provides any clues about column attributes, you can't set them automatically. So you have to do a lot of work anyway. The names are the least problem.
07-13-2017 08:29 AM
Well I sincerely hope you find what you are looking for. Please tell me if you find it.
Not later than yesterday I had to do the same thing.
I copied the Word data to an appropriate text editor, converted the special characters to adequate ones, ensured the proper tab delimitation and missing replacement and imported it as a formatted text file and performed an extensive quality check.
I don't know how but hey I think what you want is doable. A *.docx file is nothing else than a zipped XML file. I think it's feasible to hack yourself into it and extract the formatted tables.
- Cheers -