I am searching my file system mining for metadata, am looking at all old txt files in this iteration. Thus I am in this case trying to read txt files with per file an unknown delimiter, and unknown length. Is there a preferred method (function) to go about this? I was thinking a whole line at a time, but am open to ideas. -Keith
(I have already done the work for many other file extensions with easily extractable metadata)
I would define a small set of possible delimiters and do a character count for each candidate delimiter within the first, say, 20 lines of the file. If none of the delimiters appears, I would assume that the file is delimited by spaces.
Another more subtle approach would use the fact that delimiters should occur at the same frequency on every line.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.