BookmarkSubscribeRSS Feed
kjohnsonm
Lapis Lazuli | Level 10

I am searching my file system mining for metadata, am looking at all old txt files in this iteration.  Thus I am in this case trying to read txt files with per file an unknown delimiter, and unknown length.  Is there a preferred method (function) to go about this?  I was thinking a whole line at a time, but am open to ideas. -Keith

(I have already done the work for many other file extensions with easily extractable metadata)

2 REPLIES 2
Reeza
Super User
Proc Import and then use the code created. Let the computer take the first guess.
PGStats
Opal | Level 21

I would define a small set of possible delimiters and do a character count for each candidate delimiter within the first, say, 20 lines of the file. If none of the delimiters appears, I would assume that the file is delimited by spaces.

 

Another more subtle approach would use the fact that delimiters should occur at the same frequency on every line.

PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1199 views
  • 0 likes
  • 3 in conversation