BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mrinal
Calcite | Level 5

Hi All,

I would like to know which is a more effective way to import a datafile into SAS: proc import or infile statement.

What is the difference in terms of efficiency or usefulness between both. And which one should i use if a have a large dataset, with more than 100000 observations.

Thanks in advance.

Best regards,

Mrinal

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Infile allows you to specify variable types/formats directly which you can't do in proc import

If proc import incorrectly classifies a variable you'll have to manually fix it anyway.

I generally use a combination anyways, proc import will generate the infile statements in the log.

I'll use that code as a starting point and then manually modify the in file statements to read the data.

If your data structure is likely to change then proc import is more likely to adapt the new data while the infile will use the old strucutre.

View solution in original post

5 REPLIES 5
Tom
Super User Tom
Super User

It depends on how much information you have about the file's content.  And your definition of effectiveness.

If you ask PROC IMPORT to convert a text file it will first analyze the file and then generate a data step to read it. This will work well when the content is unknown to you and it is easy for PROC IMPORT to figure out what type of data is in each variable.  There is a little extra processing time for PROC IMPORT to do the analysis.

If you already have information on the contents of the file you can write a more accurate data step.  It will run faster, but it might require more programmer time to create the data step.

Reeza
Super User

Infile allows you to specify variable types/formats directly which you can't do in proc import

If proc import incorrectly classifies a variable you'll have to manually fix it anyway.

I generally use a combination anyways, proc import will generate the infile statements in the log.

I'll use that code as a starting point and then manually modify the in file statements to read the data.

If your data structure is likely to change then proc import is more likely to adapt the new data while the infile will use the old strucutre.

ballardw
Super User

Other advantages to the data step are based on the full access to program steps.

Some things I incorporate in some of my data  steps to read data:

     Check for new values for codes. I have custom formats that use an "other" formatted value of "Invalid" or similar. In the program that reads the data I check for that formatted value and put details about the record with the new value.

     Split the data into multiple data sets.

     Set lengths for character variables that I know will be combined with other data sets to avoid truncation errors and warnings.

     Create SAS date/time/datetime variables especially from delimited data that isn't amenable to INFORMAT reading.

     Standardize character variables to upper/lower or proper case

     Create new variables

RW9
Diamond | Level 26 RW9
Diamond | Level 26

For my 2p's worth.  When talking about text files, e.g. csv, the proc import is just a wrapper for an infile statement which tries to be helpful.  When talking in terms of Excel or other import, then you may also need to consider SAS/Access or Office drivers.  Personally I would go for full text based format or CSV or XML, and a reader of your own design with full control.  Makes it portable between systems then as a bonus.

Kurt_Bremser
Super User

In the long run, a manually written data step will be more effective, because it will not automatically try to adapt to changed infile structures, instead it will throw an error. That lets you detect errors much earlier in the processing chain.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 8438 views
  • 4 likes
  • 6 in conversation