BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
RichardAD
Quartz | Level 8

Consider a tab-delimited text file that has a header row, and the data rows can have fields that contain a CR-LF.  Fields (data item) are bounded by double quotes when:

  • the data item contains a double quote, in which case the double quote is doubled
  • the data item contains a CRLF or other control character (TAB, etc.)

Unfortunately PROC IMPORT will not honor the active double quote bounding when it encounters an embedded CRLF and presume the rows ends at that point.

 

Data example (tacit CRLF at end of line):

a<TAB>x<TAB>c
"A<CR><LF>B"<TAB>3.14<TAB>Okay
"AAAA<CR><LF>BCD"<TAB>3.14<TAB>Okay
filename data temp ;
data _null_ ;
file data ; put "61097809630D0A22410D0A422209332E3134094F6B61790D0A"x @@ ;
put "22414141410D0A4243442209332E3134094F6B61790D0A"x @@ ;
stop ;
run ;

Proc IMPORT falters at the embedded CRLF

proc import datafile=data dbms=csv replace out=example ;
  guessingrows=max ;
  delimiter='09'x ;
  getnames=yes ;
run ;

Logs and outputs

Number of names found is greater than number of variables found. 
52             data WORK.EXAMPLE    ;
53             %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
54             infile DATA delimiter='09'x MISSOVER DSD  firstobs=2 ;
55                informat a $12. ;
56                format a $12. ;
57             input
58                         a  $
59             ;
60             if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
61             run;
5 rows created in WORK.EXAMPLE from DATA.
Obs    a
 1     "A            
 2     B"            
 3     "AAAA         
 4     BCD"          
 5  

DSD is good only for masking embedded delimiters and parsing repeated delimiters as a missing value. It will not mask embedded CRLF.

 

The entire file as a single record ("Handling Large Stream Files with the @'string' Feature", Rick Langston) has some promise but would require bookkeeping to track a 'row' according to number of pieces parsed out aligning with number of pieces in first row.

data _null_ ;
  infile DATA delimiter='09'x DSD lrecl=1000 recfm=f;
  length piece $20 ;
  input piece @@ ;
  putlog _n_= piece= ;
  
  if _n_ = 10 then stop ;
run ;
_N_=1 piece=a
_N_=2 piece=x
_N_=3 piece=c        <eol CRLF>
"A                   <data item CRLF> 
B"
_N_=4 piece=3.14
_N_=5 piece=Okay     <eol CRLF>
"AAAA                <data item CRLF>
BCD"
_N_=6 piece=3.14
_N_=7 piece=Okay     <eol>

 

Is this approach worthwhile or are there other features or procedure options that can read the file in a simpler way?

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

In quick search, I found this thread:
https://communities.sas.com/t5/SAS-Programming/Input-a-CSV-file-with-text-fields-that-contain-line-b...

 

Where @Tom links to his utility macro to replace CRLF that appear inside quotes, could be a helpful approach for pre-processing the file:

https://github.com/sasutils/macros/blob/master/replace_crlf.sas

 

View solution in original post

1 REPLY 1
Quentin
Super User

In quick search, I found this thread:
https://communities.sas.com/t5/SAS-Programming/Input-a-CSV-file-with-text-fields-that-contain-line-b...

 

Where @Tom links to his utility macro to replace CRLF that appear inside quotes, could be a helpful approach for pre-processing the file:

https://github.com/sasutils/macros/blob/master/replace_crlf.sas

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 629 views
  • 3 likes
  • 2 in conversation