BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GreyJoy
Obsidian | Level 7

Here is a challenge for the SAS community. I have been given a word document from a colleague that is an output table from a SAS program form years ago. They do not have access to the original program or the original data. I need to input the data back into SAS. The problem is the file is in true report format, multiple headers, multiple lines for the headers and very irregular structured variable name - value placement for Filename commands. 

 

I do not know where to start which is why I have no code attached, only a snippet of the word document. Please help, good luck. 

1 ACCEPTED SOLUTION

Accepted Solutions
sasburger
Fluorite | Level 6
I think your best bet may be to convert your word doc into a PDF or image file and use a data table extraction tool. These programs read an input as an image rather than text and then uses OCR to convert back to text in a table format.

This one offers a free demo version, and you can also request additional free credits, but if you're working with hundreds of pages, you may need to pay for the feature: https://www.extracttable.com/

Tabula uses click and drag feature to detect tables on a PDF: https://tabula.technology/

To avoid this problem in the future, please be careful who you vote for for our presidency.

Much Love,
Wendy


View solution in original post

6 REPLIES 6
data_null__
Jade | Level 19

Your first step is to save/as the file as a text file

GreyJoy
Obsidian | Level 7

I just want to make sure you are aware that changing the document from .doc to .txt doesn't change or alter the problem. I think I understand what you are getting at but I am not sure you grasp the real issue. If you need to re-read the post you can, its up top.

sasburger
Fluorite | Level 6
I think your best bet may be to convert your word doc into a PDF or image file and use a data table extraction tool. These programs read an input as an image rather than text and then uses OCR to convert back to text in a table format.

This one offers a free demo version, and you can also request additional free credits, but if you're working with hundreds of pages, you may need to pay for the feature: https://www.extracttable.com/

Tabula uses click and drag feature to detect tables on a PDF: https://tabula.technology/

To avoid this problem in the future, please be careful who you vote for for our presidency.

Much Love,
Wendy


GreyJoy
Obsidian | Level 7
Thank you SASBurger! You are such an amazing person, a really good amazing person. Really, the best. Yuge, beliveme. Right when the night seemed darkest I had you to insurrect my problem and inoculate me from any further issues I might have. God Bless you and God bless our Czar, Joe.

You are right. In the future i will be more careful with my vote.

Much adoration,
Grey
ballardw
Super User

You may get "a table" from one of the extraction tools. You still will have a LONG way to go to get a working data set because you will have to parse cells to extract multiple values from cells or different values that occur within the same column. SAS data sets one variable (or column) is a single value: Name, measurement, code value, what have you. That document has multiple values stacked in single columns.

 

I can and have read such "data" layouts (actually looks like a set of report tables). It is not a trivial or beginning exercise.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 794 views
  • 2 likes
  • 4 in conversation