03-31-2013 08:20 PM
I just read an article in eDiscovery Times about Beyond Recognition:
The author makes the point that OCR technology has not improved in the last 5 to 10 years, but that the Beyond Recognition (BR) application allows much more sophisticated technique of "glyph clustering." I'm interested in this as we have to OCR all of our documents that have been digitized from paper. When the paper copy is not in good condition, it can be very difficult to get a good image and therefore accurate representation of the text.
Has anyone used BR and what has been your experience with it?
04-02-2013 12:38 PM
I'd also be interested if anyone has used BR. OCR does not work well on my surveys that contain handwritten comments, but this BR approach sounds promising.
04-07-2013 05:33 PM
BeyondRecognition provides a number of document processing technologies for far more than just creating text from images. One of the key functionalities is classifying documents based on visual similarity, NOT based on a textual comparison. BR's visual similarity approach serves to normalize documents regardless of the type of container file, e.g. Word docs, PDF's printed directly from those Word docs, or scanned TIF or image-only PDF copies made from paper printouts of those files, all get classified together despite differences in resolution or orientation.. Well logs, maps, and graphs can be classified based on their appearance. The classification occurs automatically and is scalable to large collections or business processes. BR's visual coding can be used to quickly and accurately extract data elements from the classes for use in subsequent downstream data analytics programs.
For more information on BR's text creation, visual classification, visual coding, and logical document boundary determination capabilities, see the BR blog and website at: http://beyondrecognition.net/resources/document-u-blog/
04-07-2013 06:17 PM
One, can BR be set to automatically identify and delete tables that are contained in a document?
Two, what kind of classification and categorization can it perform? E.g., given a bunch (say 500,000 to 1 million or so documents), can it automagically group those documents into clusters that differentiate the documents' contents?
04-07-2013 06:57 PM
BR has a "negation" process in which it can remove or delete certain content, and, depending on what the tables looked like, negation could be used to remove the tables. BR can also be used to redact specific terms.
To use your terminology, BR can "automagically" cluster millions of documents based on their visual appearance. The visual classification may provide sufficient differentiation, or you may want to use visual coding to base differentiation on different coded values. For example, visual classification would put contracts of a certain type in a visual classification. To identify contracts with a specific customer or from a specific zip code, visual coding could be used to create fielded data for customer name or customer zip code - such fields can then be used to differentiate within the classification.
04-14-2014 09:48 PM
OCR, is the mechanical or electronic conversion of scanned or photographed images of typewritten or printed text into machine-encoded/computer-readable text. A reliabel OCR reader can provide users fast and accurate image recognition function, which converts scanned images into searchable text formats, such as PDF, PDF/A, WORD and any other document formats and almost all the image formats can be detected and recognized by OCR control. Actually, an ocr scanner can batch recognize and process large volume images and documents in over 40 languages and characters sets.