Hi, I'm trying to import a semicolon-delimited text file into SAS. The file is large, consisting of ~35 million observations. Data transfer stops at observation 10,892,049. The file consist of technical documents that have titles. It has six variables (not enclosed in parentheses), and the fourth variable is the title of the document, which is VARCHAR 1500--you can see why the title could be very long, for example in the case of a chemistry paper that reports on a large biological molecule, while in other cases, it could be very short. Titles contain heterogeneous text data. When I examine the final observation, 10,892,049, execution ceases in the middle of the title field, with character '/'. And yes, I know that this is a special character in SAS. So, I have three questions. First, I do not really need the title variable. Using INFILE and INFORMATS, is there a way to skip over the title field? Second, and alternatively, can I tell SAS to accept special characters as part of the title field? Finally, I am not thinking that memory problems are causing execution to stop, but am I correct in this? I am running SAS 9.4 on Windows 10 installed on an HP server. The code that I have been using so far is PROC IMPORT, and that code is as follows: options nocenter replace ls=76 ps=54; libname tlaivs 'C:\Adams_Data\Adams_NSF1_Commerc_Compsci\Database_IVs\sasdsns'; proc import datafile="C:\Adams_Data\Adams_NSF1_Commerc_Compsci\Database_IVs\item.txt" dbms=dlm out=tlaivs.item replace; delimiter=';'; getnames=yes; guessingrows=MAX; run; proc contents data=tlaivs.item; run; -------- The log file looks like this: NOTE: Copyright (c) 2016 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software 9.4 (TS1M6) Licensed to JAMES ADAMS, Site 70250096. NOTE: This session is executing on the X64_10PRO platform. NOTE: Analytical products: SAS/STAT 15.1 NOTE: Additional host information: X64_10PRO WIN 10.0.18362 Workstation NOTE: SAS initialization used: real time 1.43 seconds cpu time 1.28 seconds 1 options nocenter replace ls=76 ps=54; 2 libname tlaivs 2 ! 'C:\Adams_Data\Adams_NSF1_Commerc_Compsci\Database_IVs\sasdsns'; NOTE: Libref TLAIVS was successfully assigned as follows: Engine: V9 Physical Name: C:\Adams_Data\Adams_NSF1_Commerc_Compsci\Database_IVs\sasdsns 3 4 proc import 4 ! datafile="C:\Adams_Data\Adams_NSF1_Commerc_Compsci\Database_IVs\item.tx 4 ! t" 5 dbms=dlm 6 out=tlaivs.item 7 replace; 8 9 delimiter=';'; 10 getnames=yes; 11 guessingrows=MAX; 12 run; 13 /********************************************************************* 13 ! * 14 * PRODUCT: SAS 15 * VERSION: 9.4 16 * CREATOR: External File Interface 17 * DATE: 11OCT19 18 * DESC: Generated SAS Datastep Code 19 * TEMPLATE SOURCE: (None Specified.) 20 ********************************************************************** 20 ! */ 21 data TLAIVS.ITEM ; 22 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ 23 infile 23 ! 'C:\Adams_Data\Adams_NSF1_Commerc_Compsci\Database_IVs\item.txt' 23 ! delimiter = ';' MISSOVER DSD lrecl=32767 firstobs=2 ; 24 informat item_id $15. ; 25 informat issue_id $10. ; 26 informat item_number best32. ; 27 informat title $819. ; 28 informat doc_type $32. ; 29 informat ref_count $4. ; 30 format item_id $15. ; 31 format issue_id $10. ; 32 format item_number best12. ; 33 format title $819. ; 34 format doc_type $32. ; 35 format ref_count $4. ; 36 input 37 item_id $ 38 issue_id $ 39 item_number 40 title $ 41 doc_type $ 42 ref_count $ 43 ; 44 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection 44 ! macro variable */ 45 run; NOTE: The infile 'C:\Adams_Data\Adams_NSF1_Commerc_Compsci\Database_IVs\item.txt' is: Filename=C:\Adams_Data\Adams_NSF1_Commerc_Compsci\Database_IVs\item.tx t, RECFM=V,LRECL=32767, File Size (bytes)=4792868783, Last Modified=16Jul2015:16:45:04, Create Time=07Oct2019:08:55:42 NOTE: 10892049 records were read from the infile 'C:\Adams_Data\Adams_NSF1_Commerc_Compsci\Database_IVs\item.txt'. The minimum record length was 40. The maximum record length was 865. NOTE: The data set TLAIVS.ITEM has 10892049 observations and 6 variables. NOTE: DATA statement used (Total process time): real time 50.38 seconds cpu time 11.95 seconds 10892049 rows created in TLAIVS.ITEM from C:\Adams_Data\Adams_NSF1_Commerc_Compsci\Database_IVs\item.txt. NOTE: TLAIVS.ITEM data set was successfully created. NOTE: The data set TLAIVS.ITEM has 10892049 observations and 6 variables. NOTE: PROCEDURE IMPORT used (Total process time): real time 31:17.49 cpu time 30:35.93 46 47 proc contents data=tlaivs.item; NOTE: Writing HTML Body file: sashtml.htm 48 run; NOTE: PROCEDURE CONTENTS used (Total process time): real time 2.98 seconds cpu time 0.57 seconds ------------------------------------ Thank you for your help! I regret having to issue this post, but none of the white papers that I have read have directly addressed the questions that I have posed above. Sincerely, James D. Adams SAS User
... View more