Hello,
Here's the last line of my csv file read with notepad++
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49cflf
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|crlf
crlf
The last crlf cause to get an empty row that I would like to eliminate during the input statement. How to do that?
data dest1.%scan(&csvFile,1,'.');
attrib
POLICY_NUMBER label="Policy Number" length=$25. format=$char25. informat=$char25.
POLICY_NUMBER_SRC label="Policy Number Source" length=$25. format=$char25. informat=$char25.
LINE_OF_BUSINESS label="Line of Business" length=$3. format=$char3. informat=$char3.
PROVINCE label="Province" length=$10. format=$char10. informat=$char10.
LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10. format=$char10. informat=$char10.
UNDERWRITING_COMPANY label="Underwriting Company" length=$10. format=$char10. informat=$char10.
DISTRIBUTOR_NUMBER label="Distributor Number" length=$10. format=$char10. informat=$char10.
DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10. format=$char10. informat=$char10.
SOURCE_SYSTEM label="Source System" length=$15. format=$char15. informat=$char15.
DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10. format=$char10. informat=$char10.
LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
CONTRACT_UUID label="Contract UUID" length=$36. format=$char36. informat=$char36.
;
infile "&csvinputpath/&csvFile." LRECL=32767 TERMSTR=CRLF DLM='|' MISSOVER FIRSTOBS=2 DSD;
INPUT
POLICY_NUMBER : $char25.
POLICY_NUMBER_SRC : $char25.
LINE_OF_BUSINESS : $char3.
PROVINCE : $char10.
LEGACY_UNDERWRITING_COMPANY : $char10.
UNDERWRITING_COMPANY : $char10.
DISTRIBUTOR_NUMBER : $char10.
DISTRIBUTOR_NUMBER_SRC : $char10.
SOURCE_SYSTEM : $char15.
DATASOURCE_BACKEND_REFERENCE : $char10.
LAST_TERM_EFFECTIVE_DATE : YYMMDD10.
LAST_TERM_EXPIRY_DATE : YYMMDD10.
CONTRACT_UUID : $char36.
;
run;
Add a DELETE statement to delete the empty line. If you only want to delete the empty line when it is the LAST line then add the END= option to your INFILE statement so you know which line is the last one.
NOTE:
Example:
options parmcards=csv;
filename csv temp termstr=crlf;
parmcards;
header row goes here
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
;
data want;
attrib
POLICY_NUMBER label="Policy Number" length=$25
POLICY_NUMBER_SRC label="Policy Number Source" length=$25
LINE_OF_BUSINESS label="Line of Business" length=$3
PROVINCE label="Province" length=$10
LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10
UNDERWRITING_COMPANY label="Underwriting Company" length=$10
DISTRIBUTOR_NUMBER label="Distributor Number" length=$10
DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10
SOURCE_SYSTEM label="Source System" length=$15
DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10
LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
CONTRACT_UUID label="Contract UUID" length=$36
;
infile csv DSD DLM='|' TERMSTR=CRLF TRUNCOVER FIRSTOBS=2 end=eof ;
INPUT POLICY_NUMBER -- CONTRACT_UUID ;
if eof and _infile_=' ' then delete;
run;
proc print;
run;
Log:
NOTE: 3 records were read from the infile CSV. The minimum record length was 0. The maximum record length was 98. NOTE: The data set WORK.WANT has 2 observations and 13 variables.
Results
The SAS System 19:28 Thursday, June 5, 2025 1 D L A E T G A A S L C O A Y D U S _ I R T L U U S C _ A N N T E T S D D D R _ E T P E E I I B R _ O L R R S B A M T L I W W T U C _ E I N R R R T K E R P C E I I I O S E F M C O Y _ T T B R O N F _ O L _ O I I U _ U D E E N I N F N N T N R _ C X T C U _ G G O U C R T P R Y M B P _ _ R M E E I I A _ B U R C C _ B _ F V R C N E S O O O N E S E E Y T U R I V M M U R Y R _ _ _ M _ N I P P M _ S E D D U O B S E N A A B S T N A A U b E R S C N N E R E C T T I s R C S E Y Y R C M E E E D 1 960-6217 AAA-AAA RE QC B 010 BEL CLAS 2008-05-01 2010-05-01 011047fd-c0d6-485f-9712-b5e8068b7b49 2 A12-3358 AAAA-BBBBB AUP QC A 001 3143 CLAS 2005-03-01 2006-03-01
Add a DELETE statement to delete the empty line. If you only want to delete the empty line when it is the LAST line then add the END= option to your INFILE statement so you know which line is the last one.
NOTE:
Example:
options parmcards=csv;
filename csv temp termstr=crlf;
parmcards;
header row goes here
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
;
data want;
attrib
POLICY_NUMBER label="Policy Number" length=$25
POLICY_NUMBER_SRC label="Policy Number Source" length=$25
LINE_OF_BUSINESS label="Line of Business" length=$3
PROVINCE label="Province" length=$10
LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10
UNDERWRITING_COMPANY label="Underwriting Company" length=$10
DISTRIBUTOR_NUMBER label="Distributor Number" length=$10
DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10
SOURCE_SYSTEM label="Source System" length=$15
DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10
LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
CONTRACT_UUID label="Contract UUID" length=$36
;
infile csv DSD DLM='|' TERMSTR=CRLF TRUNCOVER FIRSTOBS=2 end=eof ;
INPUT POLICY_NUMBER -- CONTRACT_UUID ;
if eof and _infile_=' ' then delete;
run;
proc print;
run;
Log:
NOTE: 3 records were read from the infile CSV. The minimum record length was 0. The maximum record length was 98. NOTE: The data set WORK.WANT has 2 observations and 13 variables.
Results
The SAS System 19:28 Thursday, June 5, 2025 1 D L A E T G A A S L C O A Y D U S _ I R T L U U S C _ A N N T E T S D D D R _ E T P E E I I B R _ O L R R S B A M T L I W W T U C _ E I N R R R T K E R P C E I I I O S E F M C O Y _ T T B R O N F _ O L _ O I I U _ U D E E N I N F N N T N R _ C X T C U _ G G O U C R T P R Y M B P _ _ R M E E I I A _ B U R C C _ B _ F V R C N E S O O O N E S E E Y T U R I V M M U R Y R _ _ _ M _ N I P P M _ S E D D U O B S E N A A B S T N A A U b E R S C N N E R E C T T I s R C S E Y Y R C M E E E D 1 960-6217 AAA-AAA RE QC B 010 BEL CLAS 2008-05-01 2010-05-01 011047fd-c0d6-485f-9712-b5e8068b7b49 2 A12-3358 AAAA-BBBBB AUP QC A 001 3143 CLAS 2005-03-01 2006-03-01
I would like to use Perl Regular Express to find the start of each row . and use SCAN() to get all these variables.
options parmcards=x;
filename x temp;
parmcards;
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
;
data temp;
infile x length=len;
input x $varying5000. len;
if prxmatch('/^\d{3}-\d{4}/',x) then group+1;
run;
data want;
do until(last.group);
set temp;
by group;
length line $ 20000;
line=cats(line,x);
end;
length col1-col4 $ 80;
col1=scan(line,1,'|');
col2=scan(line,2,'|');
col3=scan(line,3,'|');
col4=scan(line,4,'|');
drop x;
run;
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.