Solved: Reading a csv file with input statement - how to deal with two consecu...

alepage · Posted 06-05-2025 04:50 PM

Hello,

Here's the last line of my csv file read with notepad++

960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49cflf
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|crlf

crlf

The last crlf cause to get an empty row that I would like to eliminate during the input statement. How to do that?

data dest1.%scan(&csvFile,1,'.');		

   		attrib 
          			POLICY_NUMBER				label="Policy Number" 		length=$25. format=$char25.	informat=$char25.
		  		POLICY_NUMBER_SRC 	 			label="Policy Number Source" 		length=$25. format=$char25. 	informat=$char25.
		  		LINE_OF_BUSINESS				label="Line of Business" 		length=$3.  format=$char3.  	informat=$char3.
		  		PROVINCE       				label="Province" 			length=$10. format=$char10. 	informat=$char10.
		  		LEGACY_UNDERWRITING_COMPANY 			label="Legacy Underwriting Company" 	length=$10. format=$char10. 	informat=$char10.
		  		UNDERWRITING_COMPANY       			label="Underwriting Company" 		length=$10. format=$char10. 	informat=$char10.
		  		DISTRIBUTOR_NUMBER				label="Distributor Number" 		length=$10. format=$char10. 	informat=$char10.
		  		DISTRIBUTOR_NUMBER_SRC	    		label="Distributor Number Source" 	length=$10. format=$char10. 	informat=$char10.
		  		SOURCE_SYSTEM 	 			label="Source System"		length=$15. format=$char15. 	informat=$char15.
		  		DATASOURCE_BACKEND_REFERENCE			label="Data Source Backend Reference"	length=$10. format=$char10. 	informat=$char10.
		  		LAST_TERM_EFFECTIVE_DATE    			label="Last Term Effective Date" 	length=8 	format=YYMMDD10. 	informat=YYMMDD10.
		  		LAST_TERM_EXPIRY_DATE 			label="Last Term Expiry Date" 	length=8	format=YYMMDD10. 	informat=YYMMDD10.
		  		CONTRACT_UUID				label="Contract UUID" 	          length=$36. format=$char36. 	informat=$char36.
   			;


		
infile "&csvinputpath/&csvFile." LRECL=32767 TERMSTR=CRLF DLM='|' MISSOVER FIRSTOBS=2 DSD;
		INPUT
          		POLICY_NUMBER					: $char25.
		  		POLICY_NUMBER_SRC 				: $char25.
		  		LINE_OF_BUSINESS				: $char3.
		  		PROVINCE       				: $char10.
		  		LEGACY_UNDERWRITING_COMPANY 			: $char10.
		  		UNDERWRITING_COMPANY       			: $char10.
		  		DISTRIBUTOR_NUMBER				: $char10.
		  		DISTRIBUTOR_NUMBER_SRC			: $char10.
		  		SOURCE_SYSTEM 	 			: $char15.
		  		DATASOURCE_BACKEND_REFERENCE			: $char10.
		  		LAST_TERM_EFFECTIVE_DATE    			: YYMMDD10.
		  		LAST_TERM_EXPIRY_DATE 			: YYMMDD10.
		  		CONTRACT_UUID				: $char36.
		;
                    run;

Tom · Posted 06-05-2025 07:48 PM

Add a DELETE statement to delete the empty line. If you only want to delete the empty line when it is the LAST line then add the END= option to your INFILE statement so you know which line is the last one.

NOTE:

There is no need to include a period in a LENGTH, the values are always integers.
And no need to attach formats or informats to normal character variables.
Best to avoid the MISSOVER option. Shouldn't matter with this code since you are using LIST MODE input. But if you slipped into FORMATTED MODE input the MISSOVER option will throw away values at the end of the line that are too short for the informat being used. So unless you really want that to happen always use the TRUNCOVER option instead.
Avoid including TAB characters in your program code. It makes the code jump around when you try to edit it.

Example:

options parmcards=csv;
filename csv temp termstr=crlf;
parmcards;
header row goes here
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|

;

data want;
  attrib
    POLICY_NUMBER label="Policy Number" length=$25
    POLICY_NUMBER_SRC label="Policy Number Source" length=$25
    LINE_OF_BUSINESS label="Line of Business" length=$3
    PROVINCE label="Province" length=$10
    LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10
    UNDERWRITING_COMPANY label="Underwriting Company" length=$10
    DISTRIBUTOR_NUMBER label="Distributor Number" length=$10
    DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10
    SOURCE_SYSTEM label="Source System" length=$15
    DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10
    LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
    LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
    CONTRACT_UUID label="Contract UUID" length=$36
  ;
  infile csv DSD DLM='|' TERMSTR=CRLF TRUNCOVER FIRSTOBS=2 end=eof ;
  INPUT POLICY_NUMBER -- CONTRACT_UUID ;
  if eof and _infile_=' ' then delete;
run;

proc print;
run;

Log:

NOTE: 3 records were read from the infile CSV.
      The minimum record length was 0.
      The maximum record length was 98.
NOTE: The data set WORK.WANT has 2 observations and 13 variables.

Results

                                                           The SAS System                           19:28 Thursday, June 5, 2025   1

                                                               D
                                        L                      A
                                        E                      T
                                        G                      A
                                        A                      S      L
                                        C                      O      A
                                        Y             D        U      S
                                        _             I        R      T           L
                                        U   U         S        C      _           A
                                        N   N         T        E      T           S
                                        D   D    D    R        _      E           T
                       P                E   E    I    I        B      R           _
                       O        L       R   R    S    B        A      M           T
                       L        I       W   W    T    U        C      _           E
                       I        N       R   R    R    T        K      E           R
            P          C        E       I   I    I    O   S    E      F           M                        C
            O          Y        _       T   T    B    R   O    N      F           _                        O
            L          _        O       I   I    U    _   U    D      E           E                        N
            I          N        F       N   N    T    N   R    _      C           X                        T
            C          U        _       G   G    O    U   C    R      T           P                        R
            Y          M        B   P   _   _    R    M   E    E      I           I                        A
            _          B        U   R   C   C    _    B   _    F      V           R                        C
            N          E        S   O   O   O    N    E   S    E      E           Y                        T
            U          R        I   V   M   M    U    R   Y    R      _           _                        _
            M          _        N   I   P   P    M    _   S    E      D           D                        U
      O     B          S        E   N   A   A    B    S   T    N      A           A                        U
      b     E          R        S   C   N   N    E    R   E    C      T           T                        I
      s     R          C        S   E   Y   Y    R    C   M    E      E           E                        D

      1  960-6217  AAA-AAA     RE   QC  B  010  BEL      CLAS     2008-05-01  2010-05-01  011047fd-c0d6-485f-9712-b5e8068b7b49
      2  A12-3358  AAAA-BBBBB  AUP  QC  A  001  3143     CLAS     2005-03-01  2006-03-01

View solution in original post

Tom · Posted 06-05-2025 07:48 PM

Add a DELETE statement to delete the empty line. If you only want to delete the empty line when it is the LAST line then add the END= option to your INFILE statement so you know which line is the last one.

NOTE:

There is no need to include a period in a LENGTH, the values are always integers.
And no need to attach formats or informats to normal character variables.
Best to avoid the MISSOVER option. Shouldn't matter with this code since you are using LIST MODE input. But if you slipped into FORMATTED MODE input the MISSOVER option will throw away values at the end of the line that are too short for the informat being used. So unless you really want that to happen always use the TRUNCOVER option instead.
Avoid including TAB characters in your program code. It makes the code jump around when you try to edit it.

Example:

options parmcards=csv;
filename csv temp termstr=crlf;
parmcards;
header row goes here
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|

;

data want;
  attrib
    POLICY_NUMBER label="Policy Number" length=$25
    POLICY_NUMBER_SRC label="Policy Number Source" length=$25
    LINE_OF_BUSINESS label="Line of Business" length=$3
    PROVINCE label="Province" length=$10
    LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10
    UNDERWRITING_COMPANY label="Underwriting Company" length=$10
    DISTRIBUTOR_NUMBER label="Distributor Number" length=$10
    DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10
    SOURCE_SYSTEM label="Source System" length=$15
    DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10
    LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
    LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
    CONTRACT_UUID label="Contract UUID" length=$36
  ;
  infile csv DSD DLM='|' TERMSTR=CRLF TRUNCOVER FIRSTOBS=2 end=eof ;
  INPUT POLICY_NUMBER -- CONTRACT_UUID ;
  if eof and _infile_=' ' then delete;
run;

proc print;
run;

Log:

NOTE: 3 records were read from the infile CSV.
      The minimum record length was 0.
      The maximum record length was 98.
NOTE: The data set WORK.WANT has 2 observations and 13 variables.

Results

                                                           The SAS System                           19:28 Thursday, June 5, 2025   1

                                                               D
                                        L                      A
                                        E                      T
                                        G                      A
                                        A                      S      L
                                        C                      O      A
                                        Y             D        U      S
                                        _             I        R      T           L
                                        U   U         S        C      _           A
                                        N   N         T        E      T           S
                                        D   D    D    R        _      E           T
                       P                E   E    I    I        B      R           _
                       O        L       R   R    S    B        A      M           T
                       L        I       W   W    T    U        C      _           E
                       I        N       R   R    R    T        K      E           R
            P          C        E       I   I    I    O   S    E      F           M                        C
            O          Y        _       T   T    B    R   O    N      F           _                        O
            L          _        O       I   I    U    _   U    D      E           E                        N
            I          N        F       N   N    T    N   R    _      C           X                        T
            C          U        _       G   G    O    U   C    R      T           P                        R
            Y          M        B   P   _   _    R    M   E    E      I           I                        A
            _          B        U   R   C   C    _    B   _    F      V           R                        C
            N          E        S   O   O   O    N    E   S    E      E           Y                        T
            U          R        I   V   M   M    U    R   Y    R      _           _                        _
            M          _        N   I   P   P    M    _   S    E      D           D                        U
      O     B          S        E   N   A   A    B    S   T    N      A           A                        U
      b     E          R        S   C   N   N    E    R   E    C      T           T                        I
      s     R          C        S   E   Y   Y    R    C   M    E      E           E                        D

      1  960-6217  AAA-AAA     RE   QC  B  010  BEL      CLAS     2008-05-01  2010-05-01  011047fd-c0d6-485f-9712-b5e8068b7b49
      2  A12-3358  AAAA-BBBBB  AUP  QC  A  001  3143     CLAS     2005-03-01  2006-03-01

alepage · Posted 06-06-2025 11:30 AM

I have added end=eof and the if eof and _infile_=' ' then delete; at the end of the input statment as you did and it works fine

Ksharp · Posted 06-05-2025 09:06 PM

I would like to use Perl Regular Express to find the start of each row . and use SCAN() to get all these variables.

options parmcards=x;
filename x temp;
parmcards;
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|


960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
;

data temp;
infile x length=len;
input x $varying5000. len;
if prxmatch('/^\d{3}-\d{4}/',x) then group+1;
run;
data want;
do until(last.group);
 set temp;
 by group;
 length line $ 20000;
 line=cats(line,x);
end;

length col1-col4 $ 80;
col1=scan(line,1,'|');
col2=scan(line,2,'|');
col3=scan(line,3,'|');
col4=scan(line,4,'|');

drop x;
run;

Reading a csv file with input statement - how to deal with two consecutive carriage return linefeed

Re: Reading a csv file with input statement - how to deal with two consecutive carriage return linef

Re: Reading a csv file with input statement - how to deal with two consecutive carriage return linef

Re: Reading a csv file with input statement - how to deal with two consecutive carriage return linef

Re: Reading a csv file with input statement - how to deal with two consecutive carriage return linef

Reading a csv file with input statement - how to deal with two consecutive carriage return linefeed

Re: Reading a csv file with input statement - how to deal with two consecutive carriage return linef

Re: Reading a csv file with input statement - how to deal with two consecutive carriage return linef

Re: Reading a csv file with input statement - how to deal with two consecutive carriage return linef

Re: Reading a csv file with input statement - how to deal with two consecutive carriage return linef

Registration is open

SAS Training: Just a Click Away