Hello All,
I am trying to understand how the SAS EG Import Task works when Importing a Text File in 2 Scenarios that i shall mention below.
My SAS EG is Installed on Windows 7 Local machine and connects to a Remote SAS Server with following Details.
Version Details :
SAS EG : 7.15 HF3 (7.100.5.6132) (64-bit) , Installed on Windows 7 (64 bit)
Remote Server : AIX (64 bit) with SAS 9.4 M4 .
Source Text File (PIPE Delimited) size : 1GB . The Text file is present both on SAS Server and also Local machine
Free Space on Local Machine : 21.8 GB
As most of you may know , Import task in SAS EG has 4 Steps
1) Specify Data
2) Select Data Source
3) Define Field Attributes
4) Advanced Options
It is known a fact when Importing Text Files , SAS EG does perform Data Cleansing of the Source file prior to Importing the Data in the File. Also in the Step 1 of the Task , there is an sub-option under Option Performance to ByPass the Data Cleansing Process.
In all the Scenarios I discuss below this ByPass sub-option is disabled ie it is not selected as shown below. I also do not select the sub-option Limit the amount of source data examined record and field attributes as shown below.
As such , i would expect all Records in the Text file to be scanned when determining Variable attributes and also that Data cleansing shall take place prior to the Import of the Data in the file.
There is also another Option i use which is on the Step 4 of the Import Task called the Generalize Import Step to Run outside of SAS Enterprise Guide Option.
In each Scenario below I have the Import task run without and with the Above Option selected and I document what i observe.
I have highlighted my Questions in Blue.
SCENARIO 1 - SAS EG Import of Text File on the SAS Server
As I proceed from Step 1 to Step 2 of the Import task, the task downloads a copy of the File (1 GB) to Local Machine to
determine the Variable Attributes .
I then Proceed from Step 2 -> Step 3 -> Step 4 choosing the defaults offered at each Step.
On Step 4 the Option :Generalize Import Step to Run outside of SAS Enterprise Guide is not chosen.
After the Finish button is clicked on Step 4 , the task Status in SAS shows another copy of the File (1 GB) is downloaded to
Local Machine from SAS Server. Once this is done another copy of the File (1 GB) which is a cleansed file is created on
Local Machine by the task.
Cleansed File is then uploaded by the task to SAS Server and Data Step Code generated by Import Task Runs and Import is
Completed
The Import Task takes about 4 Mins to Complete.
INFILE Statement from generated Data Step Code :
INFILE '/shrproj/saswork4/SAS_work3D070310001C_paasas03/#LN00014'
LRECL=148
ENCODING="LATIN1"
TERMSTR=CRLF
DLM='7F'x
MISSOVER
DSD ;
The INFILE Statement clearly points to the uploaded Temporary cleansed file : #LN00014 in the SAS_work path.
Since 3 Copies of the Source files are downloaded/created on the Local Machine the Resultant Free Space of Local Machine
after completion of above Import Task is 21.8 GB - 3 GB =18.8 GB Free Space on Local Machine
I now close the SAS EG Session, the used up space on the Local machine due to Import above is freed up and now the Free
Space on the local Machine is again back to 21.8 GB.
Next I repeat the above Import Task again , again moving from Step 1 to Step 2 , the task downloads a copy the File ( 1 GB) to
the local machine, I then proceed to Step and Step 4 . On Step 4 this time I choose Option :Generalize Import Step to Run
outside of SAS Enterprise Guide with maximum record length chosen as 32,767 . I then click Finish . The Import task runs this time starting with the Data Step Code running and the Import is done in 51 secs to complete.
Since Only 1 copy of the File is downloaded to the Local Machine , the resultant Free Space of Local Machine after completion of above Import Task :21.8 GB - 1 GB = 20.8 GB Free Space on Local Machine
INFILE Statement from generated Data Step Code :
INFILE '/shrproj/sastemp/Healthscape/Pharmacy_20180328.txt'
LRECL=32767
ENCODING="LATIN1"
DLM='7c'x
MISSOVER
DSD ;
The INFILE statement clearly points to the actual path and name of the Text file and also the LRECL value of 32,767 is as
expected.Obviously this is a code that one can use Outside of SAS EG.
My question is : Why does the Import Task not involve any Downloading,Creating and uploading of Cleansed file in the above case? Why is the Cleansing Process Bypassed in this Case? Does this have to do with choosing the option Generalize Import Step to Run outside of SAS Enterprise Guide in Step 4 of the Import task? If so , why?
I again close SAS EG Session, the Free Space of Local machine again goes back to 21.8 GB and i proceed to Scenario 2.
SCENARIO 2 - SAS EG Import of SameText File from Local Machine
The Import now involves the Same text file but this time Imported from the Local machine.I proceed from Step 1 --> Step 2-->Step 3 and Finally to Step 4. I do not choose Generalize Import Step to Run outside of SAS Enterprise Guide. I click Finish to begin the Import Process. At first a Cleansed File is created by the Task on the Local machine with a copy of the File (1GB) , after this is created it is uploaded to SAS Server , then the Data Step for the Import runs and Import gets completed. The Total time for the task is 3 mins 17 secs.
The Resultant Free Space on the Local Machine is now 21.8 GB - 1 GB =20.8 GB
INFILE Statement from generated Data Step Code :
INFILE '/shrproj/saswork4/SAS_work259F025900DE_paasas03/#LN00010'
LRECL=148
ENCODING="LATIN1"
TERMSTR=CRLF
DLM='7F'x
MISSOVER
DSD ;
The INFILE Statement points to the uploaded Temporary Cleansed file : #LN00010 on the SAS Server in SAS work path.
I close the SAS EG Session again, the Free space of Local machine goes back to 21.8 GB.
I again start Importing the File from the Local machine ,proceed from Step 1 -->Step 2 -->Step 3 -->Step 4. This time I choose the Option Generalize Import Step to Run outside of SAS Enterprise Guide and a maximum record length of 32,767.I click on Finish and Import begins. I see that the Import task Status is transferring Cleansed file to SAS Server , when this is done the Data Step for the Import Runs and the Import gets completed. The Import task takes about 1 min and 25 sec.
There is no change in the Free Space on the Local Machine and it is still 21.8 GB.
INFILE Statement from generated Data Step Code :
INFILE '/shrproj/saswork4/SAS_work104B01EF00E0_paasas03/#LN00010'
LRECL=32767
ENCODING="LATIN1"
DLM='7c'x
MISSOVER
DSD ;
Again the INFILE statement points to the uploaded temporary Cleansed File on the SAS Server : #LN00010 in the SAS Work path and LRECL =32,767 as expected.
My question is : Where is the Cleansed file ( which gets uploaded ) created? I do not any see change in the Free Space on the Local during the above Import Process and thus i conclude the cleansed file is not created on the Local machine. But while Import task is running i do see a message that cleansed file is being transferred to the SAS Server. Why is this so? Again i assume this has got to do with me selecting the Option Generalize Import Step to Run outside of SAS Enterprise Guide in Step 4 of the Import task. But why?
I am sorry the description above is a bit long But my aim to clearly understand what's happening and therefore i wanted to be as clear as possible for the audience inorder to get answers.
Thanks.
@pchegoor The tasks work in 2 phases: design time and run time. The field examination happens at design time, allowing you to see the field names and specify attributes. When you click Finish, that phase is completed and the task is dismissed.
But then EG needs to actually run the Import process with your settings. The task reinitializes, and downloads a copy of the source file for the cleansing step. This step will happen immediately upon Finish, but also any time you simply refresh the task by re-running the project or flow.
That's how it works. If the source file is so large that the download is a problem, I'd suggest using the Import task on a smaller version of the file to design the fields and generate the code. Then copy that into a SAS program to use on the full file in subsequent runs, skipping the download altogether.
Hi @pchegoor,
Wow, you've done a lot of investigation here! I have two blog posts (which you have probably read) about how the Import Task works.
And then there is this article about the "Bypass cleansing process".
I can't explain all of the activity you're seeing. The local space that is allocated to temp files (prepped by the task) can vary, and I'm not sure if checking the space available at regular intervals while EG is running is the best way to learn the story.
However, if you want to take a peek into the temp files that EG creates locally during its session, you can find the temp folder by looking at Help->About SAS Enterprise Guide, Configuration Details. In that window, you'll see the temp files folder listed.
If you're just curious about exactly what's happening, I'm not sure that I can provide more detail than I already have. However, if there is a technical outcome that you are trying to achieve (faster imports, reduced use of temp space, ensure all processing happens on the server), then let us know. The Import Data task is convenient and is good at designing the import process. But if you want to optimize the import of a file that has a known layout, you'll always do better by writing your own code (or modifying what the task produces to fit your situation).
@ChrisHemedinger I kind of figured out the Temp location for SAS EG session on my Location Machine by Trial and Error. But good to see that it can be easily known from the Configuration Details in the About SAS Enterprise Guide Window. Thanks for this Tip.Always get to learn something from you on SAS EG.
My basic point in the above question is this: When using the Import task in SAS EG to Import a Input text file located either on the SAS Server or on the Location Machine ( or Windows shared network Drive) , the selection of the Option Generalize Import Step to Run outside of SAS Enterprise Guide in Step 4 of the Import task Disables/Bypasses the Cleansing of the Input Text file even if I do not explicitly choose the Sub-Option: ByPass the Data Cleansing Process under the Performance Option in Step 1 of the Import task. Not sure if this how SAS EG Import Task was designed to Work or if this is some kind of a bug.
That behavior makes sense to me. Since EG is doing the cleansing work, "generalize for use outside of EG" would have to mean that the cleansing step is skipped. All you're left with is the SAS code -- PROC IMPORT or DATA step with INFILE/INPUT.
@ChrisHemedinger So essentially the ByPass the Data Cleansing Process Option and Generalize Import Step to Run outside of SAS Enterprise Guide Option seem to have the same effect on the SAS EG Import task. Am I right?
I have one Final question on this topic.Not sure if you can answer it .
I have noticed that when the Source Text File is on a Remote Server and I import this using SAS EG Import Task, 2 copies of the file are downloaded from the Remote Server to the Client Machine.The 1st copy is downloaded when proceeding from Step 1 to Step 4 of the Import task. This 1st copy is used to examine the records to determine record length and variable attributes. The 2nd Copy is downloaded after clicking the Finish button on Step 4. This 2nd copy is then used a create a Cleansed file on the Client Machine. So basically under the default Importing Scenario with no Performace Options chosen the Client Machine has to have adequate space for storing 2 copies of the Source File + the Cleansed File so approximately 3 copies of the Source File. My question is why the Import Task cannot use the 1st copy downloaded to the Client machine to create the Cleansed file instead of downloading the 2nd copy to accomplish this?
@pchegoor The tasks work in 2 phases: design time and run time. The field examination happens at design time, allowing you to see the field names and specify attributes. When you click Finish, that phase is completed and the task is dismissed.
But then EG needs to actually run the Import process with your settings. The task reinitializes, and downloads a copy of the source file for the cleansing step. This step will happen immediately upon Finish, but also any time you simply refresh the task by re-running the project or flow.
That's how it works. If the source file is so large that the download is a problem, I'd suggest using the Import task on a smaller version of the file to design the fields and generate the code. Then copy that into a SAS program to use on the full file in subsequent runs, skipping the download altogether.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.