Write and run SAS programs in your web browser

Reading raw files

Reply
Occasional Contributor
Posts: 9

Reading raw files

Hello all;

Is there any difference between those codes;

   data work.sfosch;
      infile '/folders/myfolders/sasuser.v94/prog1data/sfosch.txt';
      input FlightID $1-7 RouteID $8-14 Destination $18-20
   run;

and 

  proc import out=work.sfoch;
    datafile='/folders/myfolders/sasuser.v94/prog1data/sfosch.txt'
    dbms=tab;
  run;

According to me, the first code works in case there is no any specific delimiter but second one only can be applied for if delimiter of raw data is "tab". Am I correct?

 

Thanks.

 

Super User
Super User
Posts: 9,599

Re: Reading raw files

Posted in reply to BURHAN_CIGDEM

Nope, in your first step your explictly state the variables to read and where and how long to read them.  So you don't have a delimiter read, it will read char's 1-7 and put that into var 1.

 

The second, proc import - is a guessing procedure.  Your leaving the whole thing up to the software to guess your data.  Sure the delimiter it looks for is tab, and if chars 1-7 have data, and then there is a tab, then you should get the same result, but proc import is still guessing and may get it wrong.  If however there is 8 characters, then a tab, then your variable will have 8 characters, and the first code would only read 7 of them.

Occasional Contributor
Posts: 9

Re: Reading raw files

I guess, if there is a delimiter in raw data(i.e. tab), second code is easier to write.But in the first code you depend on variables' long,location etc.
Super User
Posts: 10,280

Re: Reading raw files

Posted in reply to BURHAN_CIGDEM

Yes, there is a fundamental difference, as in the first code you decide how data is read, while in the second you rely on the guesses that proc import has to make.

Second, in the data step you use specific positions for your columns, while in the proc import SAS will write a data step that uses dlm='09'x in the infile statement, and scans dynamically for delimiters in each input line.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 9

Re: Reading raw files

Posted in reply to KurtBremser
Well, while there is a delimiter in the raw data, isn't proc import enough for us? Can I trust the proc import step or should i write my data step?
Super User
Posts: 13,583

Re: Reading raw files

Posted in reply to BURHAN_CIGDEM

@BURHAN_CIGDEM wrote:
Well, while there is a delimiter in the raw data, isn't proc import enough for us? Can I trust the proc import step or should i write my data step?

If you read multiple files that should use the same layout you will find, unless the data is exceptionally cleanly formatted, that the lengths of character variables are likely to change. Which means that when data sets are combined you will get warnings about potentially truncated variables and sometimes actually truncated values.

 

 

If you have data files with values contain mixes of all digits with digits plus alpha characters, such as occurs with zip codes (12345 and 12345-4432 with zip plus 4), account numbers (123456789 or MB988847), product codes (12345 or 9-12345) you may find that your variable changes type between data sets. Which means that attempts to combine data sets will fail because a variable can only be of one type.

 

You may even have variables change names depending on how clean your data supplier is. I worked with one client that asked why we had monthly billings for changing code. They would change the order and names of columns constantly (2 or 3 times a month) in the files they sent us. Proc Import in that case would have created multiple data sets with different variables requiring additional steps to get all of the "productname" variables into one variable of a useable length.

 

Proc Import guesses. If you are going to use Proc Import for delimited files I recommend habitually using the GUESSINGROWS option to give the best chance of good data. Otherwise SAS only examines 20 rows of data to determine variable types. If a value is only populated sometimes, like "fifth child name", you are likely to end up with only one character in the names that actually occur if it doesn't appear in the first 20 lines of data.

Super User
Posts: 10,280

Re: Reading raw files

Posted in reply to BURHAN_CIGDEM

Since proc import creates a data step for text files, you can use it to quickly get code which you can then adapt to the file specification.

With experience, you will find that writing such steps directly is quicker for you.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Ask a Question
Discussion stats
  • 6 replies
  • 163 views
  • 2 likes
  • 4 in conversation