- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What is the difference between proc import and infile statement
Which is best for raw data
Please give some points pros and cons
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PROC IMPORT
- Less and easier coding
- Not as flexible and powerful
- Handles more file types, like MS Excel spreadsheets, MS Access databases
- Limited to delimited text input
DATA step INFILE
- More complicated to code
- More flexible and powerful
- Can't handle MS Excel or Access files
- Can handle pretty much any type of text input
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi saskiwi
Thank you for reply
proc import can handle notepads raw data
if variable names not in notepad files in that case which is best
and how to import multiple different raw data files how to import at a time into sas
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Which one is better depends on the contents and data as pointed out by saskiwi, so I think it is case by case.
And with proc import, you cannot import multiple files at once (requires macros, etc.), but with INFILE, you can.
filename f 'c:\temp\*.txt';
data sample;
length text $2000;
infile f;
input;
text=_infile_;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
DATA step INFILE is better if there are no column names and is the only option for reading in multiple files in one step.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The biggest con of PROC IMPORT: it cannot provide consistent results because of the guessing it does with each run. A data step will always create the same dataset structure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Additionally: Proc Import expects data to be simple row/column tabular layout. Each row is one record, each column a different variable. If the file is at all more complex, such as a group value only appears on the first record of a group of related records you will have missing values for the group variable(s) on many records using Proc Import but can (usually) program a data step to get values for each record.
Proc Import makes separate decisions for each file as to variable name, type, length and informat to read the data. If you have multiple files that have the same structure but different contents it is very likely that using Proc Import on multiple files will create inconsistent data such as different variable types for the same column or different lengths of character variables meaning that combining data afterwards may either fail or truncate data.
If you have a text file with a describing document as to variable lengths, types and content it is often preferable to use that information to read with a data step.
Also, with appropriate coding you can read multiple files in a single data step into a single data set which proc import won't do.
If you have columns that should be numeric but have occasional codes such as "NULL", "NA", "MISSING" , ">10000" or similar text Proc Import will make those columns character if those values are encountered. With a data step you set the type. If you know exactly all these special behaviors you can create custom Informats to read the columns and provide either custom values or special missing so you can know that the value was missing when read.
If your source file has multiple header rows proc import will typically use one for variable names and the following to set variable type, which would mean that numeric variables may be character.