BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasmhe
Fluorite | Level 6

My .csv file has 50+ million records.

 

The maximum widths of each field (Excel column) are known (to me). 

 

But the early records in the .csv file contain fewer characters in some fields than in later records (much later) -- and SAS is truncating data when importing those later records.

 

Is there a way to specify all the field widths in the PROC IMPORT syntax and avoid GUESSINGROWS completely?

 

Or, is there another way to input the .csv file in one pass and not lose data?  

 

I'm new to SAS, so my terms may be off.

 

Thank you.

 

1 ACCEPTED SOLUTION
6 REPLIES 6
sasmhe
Fluorite | Level 6

Thanks, KurtBremer.

The Data Step method worked fine.

sasmhe
Fluorite | Level 6

Hi ChrisNZ,

Method 2 in the link you sent worked perfectly.

Thanks very much,

seamhe

 

jimbarbour
Meteorite | Level 14

Both of the above commenters have given good advice:  Code a Data step.  Let me suggest some alternatives to generate the code:

1.  You can use Proc Import to generate the Data step.  Proc import generates a Data step under the covers, and you can take that code from the log as the basis for your Data step.  It saves you from having to write everything from scratch.  The second set of code below was generated by a Proc Import.  I'm putting the Proc Import code last because I actually prefer my next method.

 

2.  Perhaps even better, if you have Enterprise Guide available to you, you can drag the csv file onto the Process Flow and right click it.  Up will come a pop-up on which one of the choices is "Import."  A dialog then opens the result of which is the generated Data step below.  If I don't like a particular length or definition, the code is mine to modify as I please.

 

Jim

 

Here's the Data step code generated by SAS Enterprise Guide:

/* --------------------------------------------------------------------
   Code generated by a SAS task
   
   Generated on Tuesday, September 22, 2020 at 8:34:49 PM
   By task:     Import Data Wizard
   
   Source file:
   I:\commercial\development\Core_Datasets\pgm\Related\ACTV_MBR_SRVC_P
   OP_Mismatch.txt
   Server:      Local File System
   
   Output data: WORK.ACTV_MBR_SRVC_POP_Mismatch
   Server:      Local
   -------------------------------------------------------------------- */

DATA WORK.ACTV_MBR_SRVC_POP_Mismatch;
    LENGTH
        Actv_MBR_GLB_ID  $ 13
        GbID_MBR_GLB_ID  $ 13
        Actv_MBR_ID      $ 23
        GbID_MBR_ID      $ 23
        Srvc_MBR_ID      $ 23
        Srvc_MBR_MOD_ID    8
        GbID_MBR_MOD_ID    8
        GbID_D_MBR_SK      8
        Srvc_D_MBR_SK      8
        Actv_active_mbrs $ 1
        MBR_SRVC_POP_CD  $ 19
        MBR_SRVC_POP_IND $ 1 ;
    FORMAT
        Actv_MBR_GLB_ID  $CHAR13.
        GbID_MBR_GLB_ID  $CHAR13.
        Actv_MBR_ID      $CHAR23.
        GbID_MBR_ID      $CHAR23.
        Srvc_MBR_ID      $CHAR23.
        Srvc_MBR_MOD_ID  BEST8.
        GbID_MBR_MOD_ID  BEST8.
        GbID_D_MBR_SK    BEST8.
        Srvc_D_MBR_SK    BEST8.
        Actv_active_mbrs $CHAR1.
        MBR_SRVC_POP_CD  $CHAR19.
        MBR_SRVC_POP_IND $CHAR1. ;
    INFORMAT
        Actv_MBR_GLB_ID  $CHAR13.
        GbID_MBR_GLB_ID  $CHAR13.
        Actv_MBR_ID      $CHAR23.
        GbID_MBR_ID      $CHAR23.
        Srvc_MBR_ID      $CHAR23.
        Srvc_MBR_MOD_ID  BEST8.
        GbID_MBR_MOD_ID  BEST8.
        GbID_D_MBR_SK    BEST8.
        Srvc_D_MBR_SK    BEST8.
        Actv_active_mbrs $CHAR1.
        MBR_SRVC_POP_CD  $CHAR19.
        MBR_SRVC_POP_IND $CHAR1. ;
    INFILE 'I:\commercial\development\Core_Datasets\pgm\Related\ACTV_MBR_SRVC_POP_Mismatch.txt'
        LRECL=32767
        FIRSTOBS=2
        ENCODING="WLATIN1"
        DLM='09'x
        MISSOVER
        DSD ;
    INPUT
        Actv_MBR_GLB_ID  : $CHAR13.
        GbID_MBR_GLB_ID  : $CHAR13.
        Actv_MBR_ID      : $CHAR23.
        GbID_MBR_ID      : $CHAR23.
        Srvc_MBR_ID      : $CHAR23.
        Srvc_MBR_MOD_ID  : ?? BEST8.
        GbID_MBR_MOD_ID  : ?? BEST8.
        GbID_D_MBR_SK    : ?? BEST8.
        Srvc_D_MBR_SK    : ?? BEST8.
        Actv_active_mbrs : $CHAR1.
        MBR_SRVC_POP_CD  : $CHAR19.
        MBR_SRVC_POP_IND : $CHAR1. ;
RUN;

Here's the code generated by a Proc Import.  I list the Proc Import first and immediately following is the code it generated which I cut and paste in from the log.

Proc  IMPORT	
	DATAFILE='I:\commercial\development\Core_Datasets\pgm\Related\ACTV_MBR_SRVC_POP_Mismatch.txt'
	OUT	=	WORK.ACTV_MBR_SRVC_POP_MISMATCH2
	DBMS=DLM
	REPLACE
	;
	DELIMITER='09'X;
	DATAROW=2;
	GUESSINGROWS=20000;
RUN;

 /**********************************************************************
 *   PRODUCT:   SAS
 *   VERSION:   9.4
 *   CREATOR:   External File Interface
 *   DATE:      22SEP20
 *   DESC:      Generated SAS Datastep Code
 *   TEMPLATE SOURCE:  (None Specified.)
 ***********************************************************************/
    data WORK.ACTV_MBR_SRVC_POP_MISMATCH2    ;
    infile 'I:\commercial\development\Core_Datasets\pgm\Related\ACTV_MBR_SRVC_POP_Mismatch.txt' delimiter='09'x
MISSOVER DSD lrecl=32767 firstobs=2 ;
       informat Actv_MBR_GLB_ID $13. ;
       informat GbID_MBR_GLB_ID $13. ;
       informat Actv_MBR_ID $23. ;
       informat GbID_MBR_ID $23. ;
       informat Srvc_MBR_ID $23. ;
       informat Srvc_MBR_MOD_ID best32. ;
       informat GbID_MBR_MOD_ID best32. ;
       informat GbID_D_MBR_SK best32. ;
       informat Srvc_D_MBR_SK best32. ;
       informat Actv_active_mbrs $1. ;
       informat MBR_SRVC_POP_CD $19. ;
       informat MBR_SRVC_POP_IND $1. ;
       format Actv_MBR_GLB_ID $13. ;
       format GbID_MBR_GLB_ID $13. ;
       format Actv_MBR_ID $23. ;
       format GbID_MBR_ID $23. ;
       format Srvc_MBR_ID $23. ;
       format Srvc_MBR_MOD_ID best12. ;
       format GbID_MBR_MOD_ID best12. ;
       format GbID_D_MBR_SK best12. ;
       format Srvc_D_MBR_SK best12. ;
       format Actv_active_mbrs $1. ;
       format MBR_SRVC_POP_CD $19. ;
       format MBR_SRVC_POP_IND $1. ;
    input
                Actv_MBR_GLB_ID  $
                GbID_MBR_GLB_ID  $
                Actv_MBR_ID  $
                GbID_MBR_ID  $
                Srvc_MBR_ID  $
                Srvc_MBR_MOD_ID
                GbID_MBR_MOD_ID
                GbID_D_MBR_SK
                Srvc_D_MBR_SK
                Actv_active_mbrs  $
                MBR_SRVC_POP_CD  $
                MBR_SRVC_POP_IND  $
    ;
    run;
sasmhe
Fluorite | Level 6

Hi jimbarbour,

The Data Step method worked for me.  

Thanks for your response, and I've made a note of your alternative suggestions.

-- seamhe

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1339 views
  • 4 likes
  • 4 in conversation