BookmarkSubscribeRSS Feed
kjohnsonm
Lapis Lazuli | Level 10

I am searching my file system mining for metadata, am looking at all old txt files in this iteration.  Thus I am in this case trying to read txt files with per file an unknown delimiter, and unknown length.  Is there a preferred method (function) to go about this?  I was thinking a whole line at a time, but am open to ideas. -Keith

(I have already done the work for many other file extensions with easily extractable metadata)

2 REPLIES 2
Reeza
Super User
Proc Import and then use the code created. Let the computer take the first guess.
PGStats
Opal | Level 21

I would define a small set of possible delimiters and do a character count for each candidate delimiter within the first, say, 20 lines of the file. If none of the delimiters appears, I would assume that the file is delimited by spaces.

 

Another more subtle approach would use the fact that delimiters should occur at the same frequency on every line.

PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1121 views
  • 0 likes
  • 3 in conversation