Dear all,
I have multiple text files with the "Variable names" Row not fixed (i.e. it might be row=7 in the first file, row=9 in the second file etc.).
Ofcourse I can import separately with SAS EG but that becomes very much time - consuming since there are a few text files to be imported (with delimeter=' ; ').
Moreover my Variables for each text file may not be the same across text files (e.g. in the one text file it might be
VAR1 | VAR2 | VAR3 | VAR4 | VAR5 | VAR6 | VAR7 |
in the next different names
FLD1 | FLD2 | FLD3 | FLD4 | FLD5 | FLD6 | FLD7 |
or even unequal number of variables
PRT1 | PRT2 | PRT3 | PRT4 | PRT5 | PRT6 | PRT7 | PRT8 | PRT9 | PRT10 |
Any suggestions will be more than appreciated
Thank you
Hello | ||||||
Bye | ||||||
See you | ||||||
Hotdog | : time 56:99 | |||||
Nice | ||||||
Good | Bad | |||||
VAR1 | VAR2 | VAR3 | VAR4 | VAR5 | VAR6 | VAR7 |
123 | 124 | 125 | 126 | 127 | 128 | 129 |
123 | 124 | 125 | 126 | 127 | 128 | 129 |
uuu | ||||||
jjhjh | jhhhjh | |||||
fgf | : time 56:99 | |||||
uiub | ||||||
koloop | gygy | |||||
eurftrjtj | ||||||
PPPPP | "###new#" | |||||
SSXXXW | ||||||
FLD1 | FLD2 | FLD3 | FLD4 | FLD5 | FLD6 | FLD7 |
123 | 124 | 125 | 126 | 127 | 128 | 129 |
123 | 124 | 125 | 126 | 127 | 128 | 129 |
And how do you intend to process that mess? I mean there if there is zero consistency a) internally to that data, and b) with standard delimited file structure? Best advice, dump Excel - which your obviously using here - and get the "data" in a proper format, properly documented with a data transfer agreement stating structure and content.
You could use proc import - this is basically a guessing procedure which looks at the file and guesses what the data is, then imports it to a dataset using that guess. You would of course need to know at least what row the data starts in, delimited files should have column headers on the first row, if not then you need to put this information in. That being said, what you will end up with is a mass of best guess datasets which may or may not reflect your original data. Garbage in=garbage out and all that.
Garbage like this can only be fixed by using _your_ brains. Trying to delegate the unraveling of this puzzle to the computer will only end in failure.
The only solution is to go back to the data source and tell them to deliver their data in a fixed, reliable form. Then you can start to automate the process.
You should NEVER accept native Excel files as input. Because on top of the vagaries of the users, you get the vagaries of Microsoft.
This is the kind of thing I would expect to get paid $75 or more per hour to fix.
My generic solution is :
1) decide what information you need from this stuff and it's basic format: numeric/character
2) deciede on properties such as variable name, label and format
3) find where those things are in the files.
I would start by looking for commonalities between files. You may get lucky and have some in the same format.
Then see if I can set parameters that tell me where the key items are such as the "time" row/colum and similar combinations and the row position of the 'tabular' data you show.
You will get to write some obnoxious data step and/or macro code involving complex input statements with / and @ options to read multiple lines or hold the read pointer while determining if the row/column is a key row/column combination.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.