Consider a tab-delimited text file that has a header row, and the data rows can have fields that contain a CR-LF. Fields (data item) are bounded by double quotes when:
Unfortunately PROC IMPORT will not honor the active double quote bounding when it encounters an embedded CRLF and presume the rows ends at that point.
Data example (tacit CRLF at end of line):
a<TAB>x<TAB>c
"A<CR><LF>B"<TAB>3.14<TAB>Okay
"AAAA<CR><LF>BCD"<TAB>3.14<TAB>Okay
filename data temp ; data _null_ ;
file data ; put "61097809630D0A22410D0A422209332E3134094F6B61790D0A"x @@ ;
put "22414141410D0A4243442209332E3134094F6B61790D0A"x @@ ;
stop ;
run ;
Proc IMPORT falters at the embedded CRLF
proc import datafile=data dbms=csv replace out=example ;
guessingrows=max ;
delimiter='09'x ;
getnames=yes ;
run ;
Logs and outputs
Number of names found is greater than number of variables found.
52 data WORK.EXAMPLE ;
53 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
54 infile DATA delimiter='09'x MISSOVER DSD firstobs=2 ;
55 informat a $12. ;
56 format a $12. ;
57 input
58 a $
59 ;
60 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
61 run;
5 rows created in WORK.EXAMPLE from DATA.
Obs a
1 "A
2 B"
3 "AAAA
4 BCD"
5
DSD is good only for masking embedded delimiters and parsing repeated delimiters as a missing value. It will not mask embedded CRLF.
The entire file as a single record ("Handling Large Stream Files with the @'string' Feature", Rick Langston) has some promise but would require bookkeeping to track a 'row' according to number of pieces parsed out aligning with number of pieces in first row.
data _null_ ;
infile DATA delimiter='09'x DSD lrecl=1000 recfm=f;
length piece $20 ;
input piece @@ ;
putlog _n_= piece= ;
if _n_ = 10 then stop ;
run ;_N_=1 piece=a
_N_=2 piece=x
_N_=3 piece=c <eol CRLF>
"A <data item CRLF>
B"
_N_=4 piece=3.14
_N_=5 piece=Okay <eol CRLF>
"AAAA <data item CRLF>
BCD"
_N_=6 piece=3.14
_N_=7 piece=Okay <eol>
Is this approach worthwhile or are there other features or procedure options that can read the file in a simpler way?
In quick search, I found this thread:
https://communities.sas.com/t5/SAS-Programming/Input-a-CSV-file-with-text-fields-that-contain-line-b...
Where @Tom links to his utility macro to replace CRLF that appear inside quotes, could be a helpful approach for pre-processing the file:
https://github.com/sasutils/macros/blob/master/replace_crlf.sas
In quick search, I found this thread:
https://communities.sas.com/t5/SAS-Programming/Input-a-CSV-file-with-text-fields-that-contain-line-b...
Where @Tom links to his utility macro to replace CRLF that appear inside quotes, could be a helpful approach for pre-processing the file:
https://github.com/sasutils/macros/blob/master/replace_crlf.sas
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.