12-17-2015 04:07 PM
I am searching my file system mining for metadata, am looking at all old txt files in this iteration. Thus I am in this case trying to read txt files with per file an unknown delimiter, and unknown length. Is there a preferred method (function) to go about this? I was thinking a whole line at a time, but am open to ideas. -Keith
(I have already done the work for many other file extensions with easily extractable metadata)
12-17-2015 11:18 PM
I would define a small set of possible delimiters and do a character count for each candidate delimiter within the first, say, 20 lines of the file. If none of the delimiters appears, I would assume that the file is delimited by spaces.
Another more subtle approach would use the fact that delimiters should occur at the same frequency on every line.