Hello,
Here's the last line of my csv file read with notepad++
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49cflf
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|crlf
crlf
The last crlf cause to get an empty row that I would like to eliminate during the input statement. How to do that?
data dest1.%scan(&csvFile,1,'.');
attrib
POLICY_NUMBER label="Policy Number" length=$25. format=$char25. informat=$char25.
POLICY_NUMBER_SRC label="Policy Number Source" length=$25. format=$char25. informat=$char25.
LINE_OF_BUSINESS label="Line of Business" length=$3. format=$char3. informat=$char3.
PROVINCE label="Province" length=$10. format=$char10. informat=$char10.
LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10. format=$char10. informat=$char10.
UNDERWRITING_COMPANY label="Underwriting Company" length=$10. format=$char10. informat=$char10.
DISTRIBUTOR_NUMBER label="Distributor Number" length=$10. format=$char10. informat=$char10.
DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10. format=$char10. informat=$char10.
SOURCE_SYSTEM label="Source System" length=$15. format=$char15. informat=$char15.
DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10. format=$char10. informat=$char10.
LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
CONTRACT_UUID label="Contract UUID" length=$36. format=$char36. informat=$char36.
;
infile "&csvinputpath/&csvFile." LRECL=32767 TERMSTR=CRLF DLM='|' MISSOVER FIRSTOBS=2 DSD;
INPUT
POLICY_NUMBER : $char25.
POLICY_NUMBER_SRC : $char25.
LINE_OF_BUSINESS : $char3.
PROVINCE : $char10.
LEGACY_UNDERWRITING_COMPANY : $char10.
UNDERWRITING_COMPANY : $char10.
DISTRIBUTOR_NUMBER : $char10.
DISTRIBUTOR_NUMBER_SRC : $char10.
SOURCE_SYSTEM : $char15.
DATASOURCE_BACKEND_REFERENCE : $char10.
LAST_TERM_EFFECTIVE_DATE : YYMMDD10.
LAST_TERM_EXPIRY_DATE : YYMMDD10.
CONTRACT_UUID : $char36.
;
run;
Add a DELETE statement to delete the empty line. If you only want to delete the empty line when it is the LAST line then add the END= option to your INFILE statement so you know which line is the last one.
NOTE:
Example:
options parmcards=csv;
filename csv temp termstr=crlf;
parmcards;
header row goes here
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
;
data want;
attrib
POLICY_NUMBER label="Policy Number" length=$25
POLICY_NUMBER_SRC label="Policy Number Source" length=$25
LINE_OF_BUSINESS label="Line of Business" length=$3
PROVINCE label="Province" length=$10
LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10
UNDERWRITING_COMPANY label="Underwriting Company" length=$10
DISTRIBUTOR_NUMBER label="Distributor Number" length=$10
DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10
SOURCE_SYSTEM label="Source System" length=$15
DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10
LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
CONTRACT_UUID label="Contract UUID" length=$36
;
infile csv DSD DLM='|' TERMSTR=CRLF TRUNCOVER FIRSTOBS=2 end=eof ;
INPUT POLICY_NUMBER -- CONTRACT_UUID ;
if eof and _infile_=' ' then delete;
run;
proc print;
run;
Log:
NOTE: 3 records were read from the infile CSV. The minimum record length was 0. The maximum record length was 98. NOTE: The data set WORK.WANT has 2 observations and 13 variables.
Results
The SAS System 19:28 Thursday, June 5, 2025 1 D L A E T G A A S L C O A Y D U S _ I R T L U U S C _ A N N T E T S D D D R _ E T P E E I I B R _ O L R R S B A M T L I W W T U C _ E I N R R R T K E R P C E I I I O S E F M C O Y _ T T B R O N F _ O L _ O I I U _ U D E E N I N F N N T N R _ C X T C U _ G G O U C R T P R Y M B P _ _ R M E E I I A _ B U R C C _ B _ F V R C N E S O O O N E S E E Y T U R I V M M U R Y R _ _ _ M _ N I P P M _ S E D D U O B S E N A A B S T N A A U b E R S C N N E R E C T T I s R C S E Y Y R C M E E E D 1 960-6217 AAA-AAA RE QC B 010 BEL CLAS 2008-05-01 2010-05-01 011047fd-c0d6-485f-9712-b5e8068b7b49 2 A12-3358 AAAA-BBBBB AUP QC A 001 3143 CLAS 2005-03-01 2006-03-01
Add a DELETE statement to delete the empty line. If you only want to delete the empty line when it is the LAST line then add the END= option to your INFILE statement so you know which line is the last one.
NOTE:
Example:
options parmcards=csv;
filename csv temp termstr=crlf;
parmcards;
header row goes here
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
;
data want;
attrib
POLICY_NUMBER label="Policy Number" length=$25
POLICY_NUMBER_SRC label="Policy Number Source" length=$25
LINE_OF_BUSINESS label="Line of Business" length=$3
PROVINCE label="Province" length=$10
LEGACY_UNDERWRITING_COMPANY label="Legacy Underwriting Company" length=$10
UNDERWRITING_COMPANY label="Underwriting Company" length=$10
DISTRIBUTOR_NUMBER label="Distributor Number" length=$10
DISTRIBUTOR_NUMBER_SRC label="Distributor Number Source" length=$10
SOURCE_SYSTEM label="Source System" length=$15
DATASOURCE_BACKEND_REFERENCE label="Data Source Backend Reference" length=$10
LAST_TERM_EFFECTIVE_DATE label="Last Term Effective Date" length=8 format=YYMMDD10. informat=YYMMDD10.
LAST_TERM_EXPIRY_DATE label="Last Term Expiry Date" length=8 format=YYMMDD10. informat=YYMMDD10.
CONTRACT_UUID label="Contract UUID" length=$36
;
infile csv DSD DLM='|' TERMSTR=CRLF TRUNCOVER FIRSTOBS=2 end=eof ;
INPUT POLICY_NUMBER -- CONTRACT_UUID ;
if eof and _infile_=' ' then delete;
run;
proc print;
run;
Log:
NOTE: 3 records were read from the infile CSV. The minimum record length was 0. The maximum record length was 98. NOTE: The data set WORK.WANT has 2 observations and 13 variables.
Results
The SAS System 19:28 Thursday, June 5, 2025 1 D L A E T G A A S L C O A Y D U S _ I R T L U U S C _ A N N T E T S D D D R _ E T P E E I I B R _ O L R R S B A M T L I W W T U C _ E I N R R R T K E R P C E I I I O S E F M C O Y _ T T B R O N F _ O L _ O I I U _ U D E E N I N F N N T N R _ C X T C U _ G G O U C R T P R Y M B P _ _ R M E E I I A _ B U R C C _ B _ F V R C N E S O O O N E S E E Y T U R I V M M U R Y R _ _ _ M _ N I P P M _ S E D D U O B S E N A A B S T N A A U b E R S C N N E R E C T T I s R C S E Y Y R C M E E E D 1 960-6217 AAA-AAA RE QC B 010 BEL CLAS 2008-05-01 2010-05-01 011047fd-c0d6-485f-9712-b5e8068b7b49 2 A12-3358 AAAA-BBBBB AUP QC A 001 3143 CLAS 2005-03-01 2006-03-01
I would like to use Perl Regular Express to find the start of each row . and use SCAN() to get all these variables.
options parmcards=x;
filename x temp;
parmcards;
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
960-6217|AAA-AAA|RE|QC|B|010|BEL||CLAS||2008-05-01|2010-05-01|011047fd-c0d6-485f-9712-b5e8068b7b49
A12-3358|AAAA-BBBBB|AUP|QC|A|001|3143||CLAS||2005-03-01|2006-03-01|
;
data temp;
infile x length=len;
input x $varying5000. len;
if prxmatch('/^\d{3}-\d{4}/',x) then group+1;
run;
data want;
do until(last.group);
set temp;
by group;
length line $ 20000;
line=cats(line,x);
end;
length col1-col4 $ 80;
col1=scan(line,1,'|');
col2=scan(line,2,'|');
col3=scan(line,3,'|');
col4=scan(line,4,'|');
drop x;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.