Import multiple text files with different variable names and uneven ro...

Zeus_Olympous · Posted 01-23-2017 09:41 AM

Dear all,

I have multiple text files with the "Variable names" Row not fixed (i.e. it might be row=7 in the first file, row=9 in the second file etc.).

Ofcourse I can import separately with SAS EG but that becomes very much time - consuming since there are a few text files to be imported (with delimeter=' ; ').

Moreover my Variables for each text file may not be the same across text files (e.g. in the one text file it might be

VAR1

VAR2

VAR3

VAR4

VAR5

VAR6

VAR7

in the next different names

FLD1

FLD2

FLD3

FLD4

FLD5

FLD6

FLD7

or even unequal number of variables

PRT1

PRT2

PRT3

PRT4

PRT5

PRT6

PRT7

PRT8

PRT9

PRT10

Any suggestions will be more than appreciated

Thank you

Hello
Bye
See you
Hotdog	: time 56:99
Nice
Good		Bad
VAR1	VAR2	VAR3	VAR4	VAR5	VAR6	VAR7
123	124	125	126	127	128	129
123	124	125	126	127	128	129

uuu
jjhjh		jhhhjh
fgf	: time 56:99
uiub
koloop		gygy
eurftrjtj
PPPPP	"###new#"
SSXXXW
FLD1	FLD2	FLD3	FLD4	FLD5	FLD6	FLD7
123	124	125	126	127	128	129
123	124	125	126	127	128	129

RW9 · Posted 01-23-2017 09:57 AM

And how do you intend to process that mess? I mean there if there is zero consistency a) internally to that data, and b) with standard delimited file structure? Best advice, dump Excel - which your obviously using here - and get the "data" in a proper format, properly documented with a data transfer agreement stating structure and content.

You could use proc import - this is basically a guessing procedure which looks at the file and guesses what the data is, then imports it to a dataset using that guess. You would of course need to know at least what row the data starts in, delimited files should have column headers on the first row, if not then you need to put this information in. That being said, what you will end up with is a mass of best guess datasets which may or may not reflect your original data. Garbage in=garbage out and all that.

Kurt_Bremser · Posted 01-23-2017 10:08 AM

Garbage like this can only be fixed by using _your_ brains. Trying to delegate the unraveling of this puzzle to the computer will only end in failure.

The only solution is to go back to the data source and tell them to deliver their data in a fixed, reliable form. Then you can start to automate the process.

You should NEVER accept native Excel files as input. Because on top of the vagaries of the users, you get the vagaries of Microsoft.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 01-23-2017 10:55 AM

This is the kind of thing I would expect to get paid $75 or more per hour to fix.

My generic solution is :

1) decide what information you need from this stuff and it's basic format: numeric/character

2) deciede on properties such as variable name, label and format

3) find where those things are in the files.

I would start by looking for commonalities between files. You may get lucky and have some in the same format.

Then see if I can set parameters that tell me where the key items are such as the "time" row/colum and similar combinations and the row position of the 'tabular' data you show.

You will get to write some obnoxious data step and/or macro code involving complex input statements with / and @ options to read multiple lines or hold the read pointer while determining if the row/column is a key row/column combination.

Import multiple text files with different variable names and uneven row order

Re: Import multiple text files with different variable names and uneven row order

Re: Import multiple text files with different variable names and uneven row order

Re: Import multiple text files with different variable names and uneven row order

Import multiple text files with different variable names and uneven row order

Re: Import multiple text files with different variable names and uneven row order

Re: Import multiple text files with different variable names and uneven row order

Re: Import multiple text files with different variable names and uneven row order

Registration is open

SAS Training: Just a Click Away