Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

which is more effective: proc import or infile statement

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

which is more effective: proc import or infile statement

Hi All,

I would like to know which is a more effective way to import a datafile into SAS: proc import or infile statement.

What is the difference in terms of efficiency or usefulness between both. And which one should i use if a have a large dataset, with more than 100000 observations.

Thanks in advance.

Best regards,

Mrinal


Accepted Solutions
Solution
‎07-07-2017 01:07 PM
Super User
Posts: 19,789

Re: which is more effective: proc import or infile statement

Infile allows you to specify variable types/formats directly which you can't do in proc import

If proc import incorrectly classifies a variable you'll have to manually fix it anyway.

I generally use a combination anyways, proc import will generate the infile statements in the log.

I'll use that code as a starting point and then manually modify the in file statements to read the data.

If your data structure is likely to change then proc import is more likely to adapt the new data while the infile will use the old strucutre.

View solution in original post


All Replies
Super User
Super User
Posts: 7,042

Re: which is more effective: proc import or infile statement

It depends on how much information you have about the file's content.  And your definition of effectiveness.

If you ask PROC IMPORT to convert a text file it will first analyze the file and then generate a data step to read it. This will work well when the content is unknown to you and it is easy for PROC IMPORT to figure out what type of data is in each variable.  There is a little extra processing time for PROC IMPORT to do the analysis.

If you already have information on the contents of the file you can write a more accurate data step.  It will run faster, but it might require more programmer time to create the data step.

Solution
‎07-07-2017 01:07 PM
Super User
Posts: 19,789

Re: which is more effective: proc import or infile statement

Infile allows you to specify variable types/formats directly which you can't do in proc import

If proc import incorrectly classifies a variable you'll have to manually fix it anyway.

I generally use a combination anyways, proc import will generate the infile statements in the log.

I'll use that code as a starting point and then manually modify the in file statements to read the data.

If your data structure is likely to change then proc import is more likely to adapt the new data while the infile will use the old strucutre.

Super User
Posts: 11,343

Re: which is more effective: proc import or infile statement

Other advantages to the data step are based on the full access to program steps.

Some things I incorporate in some of my data  steps to read data:

     Check for new values for codes. I have custom formats that use an "other" formatted value of "Invalid" or similar. In the program that reads the data I check for that formatted value and put details about the record with the new value.

     Split the data into multiple data sets.

     Set lengths for character variables that I know will be combined with other data sets to avoid truncation errors and warnings.

     Create SAS date/time/datetime variables especially from delimited data that isn't amenable to INFORMAT reading.

     Standardize character variables to upper/lower or proper case

     Create new variables

Super User
Super User
Posts: 7,949

Re: which is more effective: proc import or infile statement

For my 2p's worth.  When talking about text files, e.g. csv, the proc import is just a wrapper for an infile statement which tries to be helpful.  When talking in terms of Excel or other import, then you may also need to consider SAS/Access or Office drivers.  Personally I would go for full text based format or CSV or XML, and a reader of your own design with full control.  Makes it portable between systems then as a bonus.

Super User
Posts: 7,780

Re: which is more effective: proc import or infile statement

In the long run, a manually written data step will be more effective, because it will not automatically try to adapt to changed infile structures, instead it will throw an error. That lets you detect errors much earlier in the processing chain.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 2909 views
  • 3 likes
  • 6 in conversation