Hi all,
I have word document which contains just one table. Is it possible to read the width (in symbols) of each column of the table?
Thank you.
Suppose the table upper grid is like +-----+----------------------------------+--------+
then Nth column width will be: width = length(scan(row,n,'+'));
where row contains the grid line of the table and n is sequence number of the column.
The 3rd argument is the delimiter.
Probably your table is made of other characters (not + - ) then you need the replace the delimiter within the right one.
I have ordinary .doc document with Table:
It seems it's a question whether SAS can read .doc into dataset.
Save it as a .txt file and try to read it
filename mytext "<path and file_name.txt>";
data test;
     infile mytext truncover;
     input _infile_;
     
    file log;
    put _infile_;
run;Check the log and see how does your file look.
Would it be possible not to save .doc as .txt?
Should Visual Basic or something else be used?
Open the .doc file by notepad or notepad++ -
the format is very complicated.
Saving a .doc file as .txt is easy. It will not damage the .doc file.
Try it.
@DmytroYermak wrote:
Would it be possible not to save .doc as .txt?
Should Visual Basic or something else be used?
There are LOTS of not-displayed characters in a doc file to indicate things like change of font size, font face, bold/underline/italic, start/end of table/cell, placement of image, flow of text around image, lists and many other codes. Not to mention an awful lot of "style definition" elements at the start of a doc file.
If you save the doc to RTF format and examine the RTF file with a plain text editor you will get an idea about how much "stuff" is embedded in the body of a document.
SAS really is not set up to read such stuff as documents are not nice "data". I also doubt that Microsoft has released any "data engine" for Word files that could be used to extract "data". So there is no "nice" API or ODBC connection that can access just the table(s).
SAS does not have any methods for reading tables from Word files. You will have more luck if you convert the table into a spreadsheet file. SAS does have ways to read tables from spreadsheets and attempt to make sense of them. You should be able to just edit the DOC file and select the table copy it and then paste it into a blank spreadsheet.
The matter is that I wanted to know the widths of columns to put them into macrovariable. Do you think the width will be kept?
What programming language can be used to implement it?
@DmytroYermak wrote:
The matter is that I wanted to know the widths of columns to put them into macrovariable. Do you think the width will be kept?
What programming language can be used to implement it?
Since you are dealing with doc file then VBA might be one way to count characters but I don't have enough experience with VB to bother to try.
And how do you intend to use the macro variable(s)? Can you show some example code that would use the value(s)?
Counts of characters are pretty much only useful with a fixed width font such as Courier or SAS Monospace and even then you need to pay attention to font size for most purposes.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
