BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pchegoor
Pyrite | Level 9

Hello All,

 

I  am trying to  understand  how  the SAS EG Import Task works  when Importing a Text File  in 2 Scenarios that i  shall mention below.

 

My  SAS EG is Installed on Windows 7 Local machine and connects to a Remote SAS Server with following Details.

 

Version Details :

 

SAS  EG :  7.15 HF3 (7.100.5.6132) (64-bit) , Installed on Windows 7 (64 bit)

Remote Server :  AIX (64 bit)  with  SAS 9.4 M4 .

 

Source Text File (PIPE Delimited)  size : 1GB  . The Text file is present both on SAS Server and also Local machine
Free Space on Local Machine : 21.8 GB

 

 As most of you may know ,  Import task in SAS EG has 4 Steps

1) Specify Data
2) Select Data Source
3) Define Field Attributes
4) Advanced Options       

 

It is known a fact when Importing Text Files , SAS EG does perform Data Cleansing of the Source file  prior  to Importing the Data in the File. Also in the Step 1 of the Task , there is an sub-option under Option Performance to ByPass the Data Cleansing Process.

In all the Scenarios I  discuss below this ByPass sub-option is disabled  ie  it is not selected as  shown below. I also  do not select the sub-option  Limit the amount of source data examined record and field attributes as shown below.

 

As such , i would expect all Records in the Text file to be scanned when determining Variable  attributes and also that Data cleansing shall  take place prior to the Import of the Data in the file.

 

Capture1.PNG

 

 

There is also another Option i   use  which is on the Step 4 of the Import Task  called the Generalize Import Step to Run outside of SAS Enterprise Guide Option.

 

Capture2.PNG

 

 

In each  Scenario  below  I  have the Import task  run without  and with  the Above Option selected and I  document what i observe.

I have highlighted my Questions  in Blue.

 

 

SCENARIO 1  -  SAS EG Import of Text File on the SAS Server

 

      As I proceed from Step 1 to Step 2 of the Import task, the task downloads a copy of the File  (1 GB)   to Local Machine to 

     determine the Variable Attributes .

      I then Proceed from Step 2 -> Step 3 -> Step 4 choosing the defaults offered at each Step.

      On Step 4 the Option :Generalize Import Step to Run outside of SAS Enterprise Guide is not chosen.
       After the Finish button is clicked on Step 4 , the task Status  in SAS shows  another copy of the File (1 GB) is downloaded to

       Local Machine from SAS Server.  Once this is done  another copy  of the File (1 GB) which is a  cleansed file is created on

       Local Machine by the task.

       Cleansed File is then uploaded by the task  to SAS Server and Data Step Code generated by Import Task Runs and Import is

       Completed
       The  Import Task takes about 4 Mins to Complete.


      INFILE Statement from generated Data Step Code : 

                                           

    INFILE '/shrproj/saswork4/SAS_work3D070310001C_paasas03/#LN00014'
    LRECL=148
    ENCODING="LATIN1"
    TERMSTR=CRLF
    DLM='7F'x
    MISSOVER
    DSD ;
			  										

       The INFILE Statement clearly points to the uploaded Temporary cleansed file : #LN00014 in the SAS_work path.

       Since 3  Copies of the Source files are downloaded/created on the Local Machine the Resultant Free Space of Local Machine 

       after completion of above Import Task  is  21.8 GB - 3 GB =18.8 GB  Free Space on Local Machine

 

      I now close the  SAS EG Session, the used up space on the Local machine due to Import above is freed up  and now the Free 

      Space on the local Machine  is again back to 21.8 GB.

 

      Next  I repeat  the above Import Task again , again  moving from Step 1 to Step 2 , the task  downloads a copy the File ( 1 GB)  to

       the local  machine, I then  proceed to Step and Step 4 . On Step 4 this time I choose Option :Generalize Import Step to Run

       outside  of SAS Enterprise Guide  with maximum record length  chosen as 32,767 . I then click Finish . The Import task runs          this time  starting with the Data Step Code running and the Import is done in 51 secs to complete. 

 

      Since Only 1 copy of the File is downloaded to the Local Machine , the resultant Free Space of Local Machine after completion          of above Import  Task :21.8 GB - 1 GB = 20.8 GB  Free Space on Local Machine

 

       INFILE Statement from generated Data Step Code : 

           

   INFILE '/shrproj/sastemp/Healthscape/Pharmacy_20180328.txt'
   LRECL=32767
   ENCODING="LATIN1"
   DLM='7c'x
   MISSOVER
   DSD ;

     The INFILE  statement  clearly points to the actual path and name of the Text file and also  the LRECL value of  32,767 is as

      expected.Obviously this is a code  that  one can  use  Outside of SAS EG. 

 

 

My question is :  Why does  the Import Task not involve any Downloading,Creating and uploading of  Cleansed file  in the above case? Why is the Cleansing Process Bypassed  in this Case? Does this  have to do with choosing the option Generalize Import Step to Run outside of SAS Enterprise Guide in Step 4 of the Import task? If  so , why?

 

 

I again close SAS EG Session, the Free Space  of Local machine again goes back to 21.8 GB and i  proceed  to  Scenario 2.

 

SCENARIO 2 -  SAS EG Import of  SameText File from Local Machine

 

The Import now  involves  the Same text file but this time Imported from the Local machine.I proceed from Step 1 --> Step 2-->Step 3 and Finally to Step 4. I do not choose  Generalize Import Step to Run outside of SAS Enterprise Guide. I click Finish to begin the Import Process.  At first a Cleansed  File  is created by the Task  on the Local machine with a copy of the File (1GB) ,  after  this is created it is uploaded to SAS Server , then the Data Step for the Import runs and Import  gets completed.  The Total time for the task is 3 mins 17 secs.

 

The Resultant Free Space on the Local Machine is now 21.8 GB - 1 GB =20.8 GB

 

 INFILE Statement from generated Data Step Code : 

 

        

  INFILE '/shrproj/saswork4/SAS_work259F025900DE_paasas03/#LN00010'
  LRECL=148
  ENCODING="LATIN1"
  TERMSTR=CRLF
  DLM='7F'x
  MISSOVER
  DSD ;

 

The INFILE  Statement points to the uploaded Temporary Cleansed file  : #LN00010 on the  SAS Server in SAS work path.

 

 

I  close the SAS EG Session again,  the Free space of Local machine goes back to 21.8 GB. 

 

I again start Importing the  File from the Local machine ,proceed  from Step 1 -->Step 2 -->Step 3 -->Step 4. This time I choose the Option Generalize Import Step to Run outside of SAS Enterprise Guide and a maximum record length of 32,767.I click on Finish and Import  begins.  I see that the Import task Status is transferring Cleansed file to  SAS Server ,  when this is done the Data Step for the Import Runs and the Import gets completed.  The Import task  takes about  1 min and 25 sec.

 

There is no change  in the Free Space on the Local Machine and it is still 21.8 GB.

 

  INFILE Statement from generated Data Step Code :   

 

  INFILE '/shrproj/saswork4/SAS_work104B01EF00E0_paasas03/#LN00010'
  LRECL=32767
  ENCODING="LATIN1"
  DLM='7c'x
 MISSOVER
  DSD ;

Again the INFILE statement  points to the uploaded  temporary Cleansed File on the SAS Server : #LN00010 in the SAS Work path and LRECL =32,767 as expected.

 

 

My question is : Where  is the Cleansed file ( which gets uploaded ) created? I do not any see change in the Free Space on the Local during the above Import Process  and thus i conclude the cleansed file is not created on the Local machine.  But while Import task is running i do see a message that cleansed file is being transferred to the SAS Server. Why is this so?  Again i assume  this has got to do with me selecting the Option Generalize Import Step to Run outside of SAS Enterprise Guide in Step 4 of the Import task. But why?

 

 

I am sorry the description above is a bit long  But my aim to clearly understand what's happening and therefore i  wanted to be as clear as possible  for  the audience inorder to get answers.

 

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisHemedinger
Community Manager

@pchegoor The tasks work in 2 phases: design time and run time.  The field examination happens at design time, allowing you to see the field names and specify attributes.  When you click Finish, that phase is completed and the task is dismissed.

 

But then EG needs to actually run the Import process with your settings.  The task reinitializes, and downloads a copy of the source file for the cleansing step.  This step will happen immediately upon Finish, but also any time you simply refresh the task by re-running the project or flow.  

 

That's how it works.  If the source file is so large that the download is a problem, I'd suggest using the Import task on a smaller version of the file to design the fields and generate the code.  Then copy that into a SAS program to use on the full file in subsequent runs, skipping the download altogether.

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.

View solution in original post

5 REPLIES 5
ChrisHemedinger
Community Manager

Hi @pchegoor,

 

Wow, you've done a lot of investigation here!  I have two blog posts (which you have probably read) about how the Import Task works.

 

And then there is this article about the "Bypass cleansing process".

 

I can't explain all of the activity you're seeing.  The local space that is allocated to temp files (prepped by the task) can vary, and I'm not sure if checking the space available at regular intervals while EG is running is the best way to learn the story.

 

However, if you want to take a peek into the temp files that EG creates locally during its session, you can find the temp folder by looking at Help->About SAS Enterprise Guide, Configuration Details.  In that window, you'll see the temp files folder listed.

 

tempfiles.png

 

If you're just curious about exactly what's happening, I'm not sure that I can provide more detail than I already have.  However, if there is a technical outcome that you are trying to achieve (faster imports, reduced use of temp space, ensure all processing happens on the server), then let us know.  The Import Data task is convenient and is good at designing the import process.  But if you want to optimize the import of a file that has a known layout, you'll always do better by writing your own code (or modifying what the task produces to fit your situation).

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
pchegoor
Pyrite | Level 9

@ChrisHemedinger    I kind of figured out the Temp location for SAS EG  session on my Location Machine by Trial and Error.  But  good to see that it can be easily known from the Configuration Details in the About SAS Enterprise Guide Window. Thanks for this Tip.Always get to learn something from you on SAS EG.

 

My basic point in the above question is this:  When using the Import task  in SAS EG  to  Import  a Input text file  located either  on the SAS Server  or  on the Location Machine ( or Windows shared network Drive) , the selection of the Option  Generalize Import Step to Run outside of SAS Enterprise Guide in Step 4  of the Import task  Disables/Bypasses  the Cleansing of the Input Text file  even if I do not explicitly choose  the Sub-Option: ByPass the Data Cleansing Process under  the Performance Option in Step 1 of the Import task.  Not sure if this how  SAS EG Import Task was designed to Work or if this is some kind of a bug.

ChrisHemedinger
Community Manager

That behavior makes sense to me.  Since EG is doing the cleansing work, "generalize for use outside of EG" would have to mean that the cleansing step is skipped.  All you're left with is the SAS code -- PROC IMPORT or DATA step with INFILE/INPUT.

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.
pchegoor
Pyrite | Level 9

@ChrisHemedinger   So essentially the ByPass the Data Cleansing Process Option and Generalize Import Step to Run outside of SAS Enterprise Guide Option seem to have the same effect on the SAS  EG Import task. Am I right?

 

I  have  one Final question  on this topic.Not sure if  you can answer it .  

 

                                    I  have noticed that when the Source Text File is on a Remote Server and I import this using SAS EG Import Task, 2 copies of the file are downloaded from the Remote Server to the Client Machine.The 1st copy is downloaded when proceeding from Step 1 to Step 4 of the Import task. This 1st copy is used to examine the records to determine record length and variable attributes. The 2nd Copy is downloaded after clicking the Finish button on Step 4.  This 2nd copy is then used a create a Cleansed file on the Client Machine. So basically under the default Importing Scenario with no Performace Options chosen the Client Machine has to have adequate space for storing 2 copies of the Source File  +  the Cleansed File so approximately 3 copies of the Source File.  My question is why the Import Task cannot use the 1st copy downloaded to the Client machine to create the Cleansed file instead of downloading the 2nd copy to accomplish this?

ChrisHemedinger
Community Manager

@pchegoor The tasks work in 2 phases: design time and run time.  The field examination happens at design time, allowing you to see the field names and specify attributes.  When you click Finish, that phase is completed and the task is dismissed.

 

But then EG needs to actually run the Import process with your settings.  The task reinitializes, and downloads a copy of the source file for the cleansing step.  This step will happen immediately upon Finish, but also any time you simply refresh the task by re-running the project or flow.  

 

That's how it works.  If the source file is so large that the download is a problem, I'd suggest using the Import task on a smaller version of the file to design the fields and generate the code.  Then copy that into a SAS program to use on the full file in subsequent runs, skipping the download altogether.

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 3623 views
  • 0 likes
  • 2 in conversation