BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Rarry
Calcite | Level 5

The CSV sample dataset we are using is attached and contains special carriage return characters (0A and 0D that we cannot make SAS read them as such) which in turn cause open quotation marks.  We are trying to either count the number of double-quotes or the number of commas (delimiters) to determine if an observation completes before making a new observation.  When an observation does not complete, I would like to hold that observation within the buffer until the next observation can be added to the end.

   

data _test_ ;

     infile "C:\Desktop\sample_issues.CSV" 

           dlm=','

           dsd

           missover

           lrecl=30000

           firstobs = 2

           sharebuffers

          ;

  

     do until (comma > 5);       /* 5 is an arbitrary number used with this sample file */

          /*infile to buffer */

         input         
         ;

  

          put _infile_ @;

  

          /*Count the number of commas in each observation within the buffer */

     
          comma = countw(_infile_,
',');

          /* If comma variable is less than 165, then current observation needs to be held in buffer

                     until next obs is added to end */

 

          if comma < 5 then

 

               put @ ;

 

          else put _infile_;

            

               /* put _infile_@;*/

       end;

  

     drop comma;

  

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

I believe that you can only treat _INFILE_ like a variable if the LRECL of the records is below a certain value like 32K.

You could convert to program to read the lines character by character instead using RECFM=N.

filename copy temp;

data _null_;

  infile 'c:\downloads\sample_issues.csv' recfm=n ;

  file copy recfm=n ;

  input x $char1. ;

  quotes+(x='"');

  if mod(quotes,2) and (x='0D'x or x='0A'x) then put '|';

  else put x $char1.;

run;

data _null_;

  infile copy dsd dlm=',' truncover termstr=CRLF;

  length x1-x6 $200 ;

  input x1-x6;

  put (x1-x6) (=/);

run;

View solution in original post

10 REPLIES 10
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

See here:

Tom
Super User Tom
Super User

The file you posted appears to only have embedded linefeeds.

So you can read it directly by just using the TERMSTR=CRLF option on the INFILE statement.

Rarry
Calcite | Level 5

Tom,

I agree the file contains embedded linefeeds that are corrected by TERMSTR, but the second observation "PROOF OF RETURN RECEIVED REFI" contains a linefeed not picked up by the TERMSTR=CRLF option. I have included the code and output I receive below.

data _test_ ;


infile "C:\Desktop\sample_issues.CSV" 

     dlm=','

     dsd

     termstr=crlf

     missover

     lrecl=30000

     firstobs = 2

     sharebuffers

     ;

length

     Name $10

     ID 8

     Comment $30

     Type $30

     Class 3

     Message $20

     ;

input

     Name $

     ID

     Comment $

     Type $

     Class $

     Message $

     ;

informat

     Name $10.

     ID 8.

     Comment $30.

     Type $30.

     Class 3.

     Message $20.

     ;

run;

Name          ID               Comment                                             Type                                             Class          Message

adfjlkj          515511      aldfjd here                                             100                                               55               aldfjd here

lmnop"        5465          "PROOF OF RETURN RECEIVED                                                            .

REFI"          52              7                                                           hhhttt REFI                                    .

abc              244187      jfa;kj                                                     PLEASE NOTE (35 NOT "53")"    12               wjlkj

kjklwj           266121      jkjoej                                                    250                                                75               jfk;ajs


Tom
Super User Tom
Super User

So you do need to pre-process the file to get rid of the offending CR+LF combinations.

The basic method is to remove or replace the ones that appear inside of quotes.  Here is one method.  I am replacing them with '|', but you could replace with '0A'x if you want.

filename copy temp;

data _null_;

  infile 'c:\downloads\sample_issues.csv'  ;

  file copy ;

  input ;

  put _infile_ @;

  q + lengthn(compress(_infile_,'"','k')) ;

  if 0=mod(q,2) then put;

  else put '|' @;

run;

data _null_;

  infile copy dsd dlm=',' truncover termstr=CRLF;

  length x1-x6 $200 ;

  input x1-x6;

  put (x1-x6) (=/);

run;

Rarry
Calcite | Level 5

Tom,

Your above code solved our issue with the sample file; however, when I apply the code to our full file there appears to be additional returns.  The master file contains 262 fields, 250,000+ rows, and is well under the maximum string length.  I applied your code to a smaller file with only 16 fields and 6 records(changing the first infile statement and length, input, and put statements in the second data step to 16).  The code corrected our problem, but when applied to all 262 fields for 6 records and changing the length to $20000 I get linefeeds where previously there weren't any.

Tom
Super User Tom
Super User

I believe that you can only treat _INFILE_ like a variable if the LRECL of the records is below a certain value like 32K.

You could convert to program to read the lines character by character instead using RECFM=N.

filename copy temp;

data _null_;

  infile 'c:\downloads\sample_issues.csv' recfm=n ;

  file copy recfm=n ;

  input x $char1. ;

  quotes+(x='"');

  if mod(quotes,2) and (x='0D'x or x='0A'x) then put '|';

  else put x $char1.;

run;

data _null_;

  infile copy dsd dlm=',' truncover termstr=CRLF;

  length x1-x6 $200 ;

  input x1-x6;

  put (x1-x6) (=/);

run;

Rarry
Calcite | Level 5

The code above appears to have corrected our carriage return issue; however, upon reading in the CSV file i noticed any numeric field that previously contained decimals are now read in with a space replacing the decimal.  Is this due to reading in as a byte?

Tom
Super User Tom
Super User

The use of $CHAR format should have prevented the periods from being converted to spaces.

data _null_;

   input @1 period $char1. @1 space $1. ;

   put (period space) (=) '_infile_=' _infile_ ;

cards;

.

1

A

;;;;

TomKari
Onyx | Level 15

I think if you just add lrecl=32767 to the two "infile" stmts in Tom's first code sample it will work.

Patrick
Opal | Level 21

Is this something you want in the end to implement using SAS DI Studio?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 3440 views
  • 7 likes
  • 5 in conversation