BookmarkSubscribeRSS Feed
Zeus_Olympous
Obsidian | Level 7

Dear all,

 

I have multiple text files with the "Variable names" Row not fixed (i.e. it might be row=7 in the first file, row=9 in the second file etc.).

 

Ofcourse I can import separately with SAS EG but that becomes very much time - consuming since there are a few text files to be imported (with delimeter=' ; ').

 

Moreover my Variables for each text file may not be the same across text files (e.g. in the one text file it might be

VAR1VAR2VAR3VAR4VAR5VAR6VAR7

 

in the next different names

FLD1FLD2FLD3FLD4FLD5FLD6FLD7

 

or even unequal number of variables

 

PRT1PRT2PRT3PRT4PRT5PRT6PRT7PRT8PRT9PRT10

 

Any suggestions will be more than appreciated

 

Thank you

 

 

Hello      
Bye      
See you      
Hotdog: time 56:99     
Nice      
Good Bad    
VAR1VAR2VAR3VAR4VAR5VAR6VAR7
123124125126127128129
123124125126127128129

 

uuu      
jjhjh jhhhjh    
fgf: time 56:99     
uiub      
koloop gygy    
eurftrjtj      
PPPPP"###new#"     
SSXXXW      
FLD1FLD2FLD3FLD4FLD5FLD6FLD7
123124125126127128129
123124125126127128129
3 REPLIES 3
RW9
Diamond | Level 26 RW9
Diamond | Level 26

And how do you intend to process that mess?  I mean there if there is zero consistency a) internally to that data, and b) with standard delimited file structure?  Best advice, dump Excel - which your obviously using here - and get the "data" in a proper format, properly documented with a data transfer agreement stating structure and content.  

You could use proc import - this is basically a guessing procedure which looks at the file and guesses what the data is, then imports it to a dataset using that guess.  You would of course need to know at least what row the data starts in, delimited files should have column headers on the first row, if not then you need to put this information in.  That being said, what you will end up with is a mass of best guess datasets which may or may not reflect your original data.  Garbage in=garbage out and all that.

Kurt_Bremser
Super User

Garbage like this can only be fixed by using _your_ brains. Trying to delegate the unraveling of this puzzle to the computer will only end in failure.

The only solution is to go back to the data source and tell them to deliver their data in a fixed, reliable form. Then you can start to automate the process.

You should NEVER accept native Excel files as input. Because on top of the vagaries of the users, you get the vagaries of Microsoft.

ballardw
Super User

This is the kind of thing I would expect to get paid $75 or more per hour to fix.

 

My generic solution is :

1) decide what information you need from this stuff and it's basic format: numeric/character

2) deciede on properties such as variable name, label and format

3) find where those things are in the files.

   I would start by looking for commonalities between files. You may get lucky and have some in the same format.

   Then see if I can set parameters that tell me where the key items are such as the "time" row/colum and similar combinations and the row position of the 'tabular' data you show.

 

You will get to write some obnoxious data step and/or macro code involving complex input statements with / and @ options to read multiple lines or hold the read pointer while determining if the row/column is a key row/column combination.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 891 views
  • 2 likes
  • 4 in conversation