I am searching my file system mining for metadata, am looking at all old txt files in this iteration. Thus I am in this case trying to read txt files with per file an unknown delimiter, and unknown length. Is there a preferred method (function) to go about this? I was thinking a whole line at a time, but am open to ideas. -Keith
(I have already done the work for many other file extensions with easily extractable metadata)
I would define a small set of possible delimiters and do a character count for each candidate delimiter within the first, say, 20 lines of the file. If none of the delimiters appears, I would assume that the file is delimited by spaces.
Another more subtle approach would use the fact that delimiters should occur at the same frequency on every line.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.