SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Reading unstructured data into SAS Dataflux

Reply
Occasional Contributor
Posts: 19

Reading unstructured data into SAS Dataflux

Hi,

Looking at the Data Inputs node within Dataflux v2.5 - there is an option of reading unstructured data by using the Document Conversion option

We have a business requirement to read unstructured data that is held in a .TIFF file. This format isn't mentioned in the help documents - does anybody have any experience of reading this format into SAS Dataflux?

I don't have much knowledge of reading unstructured data but from Wikipedia .TIFF appears to be a supported Adobe format.

All tips - plus any other relevant information most appreciated

Thanks

Nigel

PROC Star
Posts: 1,095

Re: Reading unstructured data into SAS Dataflux

My experience with .TIFF files is that they are used for images, in which case I can't imagine any data processing tool that would import them.

1. Can you import them into anything else as data?

2. If they are images, are they images of records, in which case can you process them with OCR software?

Tom

Occasional Contributor
Posts: 19

Re: Reading unstructured data into SAS Dataflux

Hi TomKari

Thanks for the advice - you really have pointed me in the right direction (and sorry if I am asking the obvious but this is new to me)

Asking for more details apparently what I am looking to extract are document copies (ie old insurance policy documents) which are being held as a 'zyimage' - a sort of scanned image

Looking at the website ZyIMAGE Software - Document Management and Imaging Software Directory it seems that zyimage have some sort of propriatory reader (which incorporates 'fuzzy' logic) - this could be the way forward

regards

Nigel

Ask a Question
Discussion stats
  • 2 replies
  • 491 views
  • 3 likes
  • 2 in conversation