Hello all;
Is there any difference between those codes;
data work.sfosch;
infile '/folders/myfolders/sasuser.v94/prog1data/sfosch.txt';
input FlightID $1-7 RouteID $8-14 Destination $18-20
run;
and
proc import out=work.sfoch;
datafile='/folders/myfolders/sasuser.v94/prog1data/sfosch.txt'
dbms=tab;
run;
According to me, the first code works in case there is no any specific delimiter but second one only can be applied for if delimiter of raw data is "tab". Am I correct?
Thanks.
Nope, in your first step your explictly state the variables to read and where and how long to read them. So you don't have a delimiter read, it will read char's 1-7 and put that into var 1.
The second, proc import - is a guessing procedure. Your leaving the whole thing up to the software to guess your data. Sure the delimiter it looks for is tab, and if chars 1-7 have data, and then there is a tab, then you should get the same result, but proc import is still guessing and may get it wrong. If however there is 8 characters, then a tab, then your variable will have 8 characters, and the first code would only read 7 of them.
Yes, there is a fundamental difference, as in the first code you decide how data is read, while in the second you rely on the guesses that proc import has to make.
Second, in the data step you use specific positions for your columns, while in the proc import SAS will write a data step that uses dlm='09'x in the infile statement, and scans dynamically for delimiters in each input line.
@BURHAN_CIGDEM wrote:
Well, while there is a delimiter in the raw data, isn't proc import enough for us? Can I trust the proc import step or should i write my data step?
If you read multiple files that should use the same layout you will find, unless the data is exceptionally cleanly formatted, that the lengths of character variables are likely to change. Which means that when data sets are combined you will get warnings about potentially truncated variables and sometimes actually truncated values.
If you have data files with values contain mixes of all digits with digits plus alpha characters, such as occurs with zip codes (12345 and 12345-4432 with zip plus 4), account numbers (123456789 or MB988847), product codes (12345 or 9-12345) you may find that your variable changes type between data sets. Which means that attempts to combine data sets will fail because a variable can only be of one type.
You may even have variables change names depending on how clean your data supplier is. I worked with one client that asked why we had monthly billings for changing code. They would change the order and names of columns constantly (2 or 3 times a month) in the files they sent us. Proc Import in that case would have created multiple data sets with different variables requiring additional steps to get all of the "productname" variables into one variable of a useable length.
Proc Import guesses. If you are going to use Proc Import for delimited files I recommend habitually using the GUESSINGROWS option to give the best chance of good data. Otherwise SAS only examines 20 rows of data to determine variable types. If a value is only populated sometimes, like "fifth child name", you are likely to end up with only one character in the names that actually occur if it doesn't appear in the first 20 lines of data.
Since proc import creates a data step for text files, you can use it to quickly get code which you can then adapt to the file specification.
With experience, you will find that writing such steps directly is quicker for you.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.