My .csv file has 50+ million records.
The maximum widths of each field (Excel column) are known (to me).
But the early records in the .csv file contain fewer characters in some fields than in later records (much later) -- and SAS is truncating data when importing those later records.
Is there a way to specify all the field widths in the PROC IMPORT syntax and avoid GUESSINGROWS completely?
Or, is there another way to input the .csv file in one pass and not lose data?
I'm new to SAS, so my terms may be off.
Thank you.
Since you have a description of the columns, just write the DATA step for the import yourself.
Thanks, KurtBremer.
The Data Step method worked fine.
See method 2 here.
Both of the above commenters have given good advice: Code a Data step. Let me suggest some alternatives to generate the code:
1. You can use Proc Import to generate the Data step. Proc import generates a Data step under the covers, and you can take that code from the log as the basis for your Data step. It saves you from having to write everything from scratch. The second set of code below was generated by a Proc Import. I'm putting the Proc Import code last because I actually prefer my next method.
2. Perhaps even better, if you have Enterprise Guide available to you, you can drag the csv file onto the Process Flow and right click it. Up will come a pop-up on which one of the choices is "Import." A dialog then opens the result of which is the generated Data step below. If I don't like a particular length or definition, the code is mine to modify as I please.
Jim
Here's the Data step code generated by SAS Enterprise Guide:
/* --------------------------------------------------------------------
Code generated by a SAS task
Generated on Tuesday, September 22, 2020 at 8:34:49 PM
By task: Import Data Wizard
Source file:
I:\commercial\development\Core_Datasets\pgm\Related\ACTV_MBR_SRVC_P
OP_Mismatch.txt
Server: Local File System
Output data: WORK.ACTV_MBR_SRVC_POP_Mismatch
Server: Local
-------------------------------------------------------------------- */
DATA WORK.ACTV_MBR_SRVC_POP_Mismatch;
LENGTH
Actv_MBR_GLB_ID $ 13
GbID_MBR_GLB_ID $ 13
Actv_MBR_ID $ 23
GbID_MBR_ID $ 23
Srvc_MBR_ID $ 23
Srvc_MBR_MOD_ID 8
GbID_MBR_MOD_ID 8
GbID_D_MBR_SK 8
Srvc_D_MBR_SK 8
Actv_active_mbrs $ 1
MBR_SRVC_POP_CD $ 19
MBR_SRVC_POP_IND $ 1 ;
FORMAT
Actv_MBR_GLB_ID $CHAR13.
GbID_MBR_GLB_ID $CHAR13.
Actv_MBR_ID $CHAR23.
GbID_MBR_ID $CHAR23.
Srvc_MBR_ID $CHAR23.
Srvc_MBR_MOD_ID BEST8.
GbID_MBR_MOD_ID BEST8.
GbID_D_MBR_SK BEST8.
Srvc_D_MBR_SK BEST8.
Actv_active_mbrs $CHAR1.
MBR_SRVC_POP_CD $CHAR19.
MBR_SRVC_POP_IND $CHAR1. ;
INFORMAT
Actv_MBR_GLB_ID $CHAR13.
GbID_MBR_GLB_ID $CHAR13.
Actv_MBR_ID $CHAR23.
GbID_MBR_ID $CHAR23.
Srvc_MBR_ID $CHAR23.
Srvc_MBR_MOD_ID BEST8.
GbID_MBR_MOD_ID BEST8.
GbID_D_MBR_SK BEST8.
Srvc_D_MBR_SK BEST8.
Actv_active_mbrs $CHAR1.
MBR_SRVC_POP_CD $CHAR19.
MBR_SRVC_POP_IND $CHAR1. ;
INFILE 'I:\commercial\development\Core_Datasets\pgm\Related\ACTV_MBR_SRVC_POP_Mismatch.txt'
LRECL=32767
FIRSTOBS=2
ENCODING="WLATIN1"
DLM='09'x
MISSOVER
DSD ;
INPUT
Actv_MBR_GLB_ID : $CHAR13.
GbID_MBR_GLB_ID : $CHAR13.
Actv_MBR_ID : $CHAR23.
GbID_MBR_ID : $CHAR23.
Srvc_MBR_ID : $CHAR23.
Srvc_MBR_MOD_ID : ?? BEST8.
GbID_MBR_MOD_ID : ?? BEST8.
GbID_D_MBR_SK : ?? BEST8.
Srvc_D_MBR_SK : ?? BEST8.
Actv_active_mbrs : $CHAR1.
MBR_SRVC_POP_CD : $CHAR19.
MBR_SRVC_POP_IND : $CHAR1. ;
RUN;
Here's the code generated by a Proc Import. I list the Proc Import first and immediately following is the code it generated which I cut and paste in from the log.
Proc IMPORT
DATAFILE='I:\commercial\development\Core_Datasets\pgm\Related\ACTV_MBR_SRVC_POP_Mismatch.txt'
OUT = WORK.ACTV_MBR_SRVC_POP_MISMATCH2
DBMS=DLM
REPLACE
;
DELIMITER='09'X;
DATAROW=2;
GUESSINGROWS=20000;
RUN;
/**********************************************************************
* PRODUCT: SAS
* VERSION: 9.4
* CREATOR: External File Interface
* DATE: 22SEP20
* DESC: Generated SAS Datastep Code
* TEMPLATE SOURCE: (None Specified.)
***********************************************************************/
data WORK.ACTV_MBR_SRVC_POP_MISMATCH2 ;
infile 'I:\commercial\development\Core_Datasets\pgm\Related\ACTV_MBR_SRVC_POP_Mismatch.txt' delimiter='09'x
MISSOVER DSD lrecl=32767 firstobs=2 ;
informat Actv_MBR_GLB_ID $13. ;
informat GbID_MBR_GLB_ID $13. ;
informat Actv_MBR_ID $23. ;
informat GbID_MBR_ID $23. ;
informat Srvc_MBR_ID $23. ;
informat Srvc_MBR_MOD_ID best32. ;
informat GbID_MBR_MOD_ID best32. ;
informat GbID_D_MBR_SK best32. ;
informat Srvc_D_MBR_SK best32. ;
informat Actv_active_mbrs $1. ;
informat MBR_SRVC_POP_CD $19. ;
informat MBR_SRVC_POP_IND $1. ;
format Actv_MBR_GLB_ID $13. ;
format GbID_MBR_GLB_ID $13. ;
format Actv_MBR_ID $23. ;
format GbID_MBR_ID $23. ;
format Srvc_MBR_ID $23. ;
format Srvc_MBR_MOD_ID best12. ;
format GbID_MBR_MOD_ID best12. ;
format GbID_D_MBR_SK best12. ;
format Srvc_D_MBR_SK best12. ;
format Actv_active_mbrs $1. ;
format MBR_SRVC_POP_CD $19. ;
format MBR_SRVC_POP_IND $1. ;
input
Actv_MBR_GLB_ID $
GbID_MBR_GLB_ID $
Actv_MBR_ID $
GbID_MBR_ID $
Srvc_MBR_ID $
Srvc_MBR_MOD_ID
GbID_MBR_MOD_ID
GbID_D_MBR_SK
Srvc_D_MBR_SK
Actv_active_mbrs $
MBR_SRVC_POP_CD $
MBR_SRVC_POP_IND $
;
run;
Hi jimbarbour,
The Data Step method worked for me.
Thanks for your response, and I've made a note of your alternative suggestions.
-- seamhe
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.