SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Special Carriages

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Special Carriages

The CSV sample dataset we are using is attached and contains special carriage return characters (0A and 0D that we cannot make SAS read them as such) which in turn cause open quotation marks.  We are trying to either count the number of double-quotes or the number of commas (delimiters) to determine if an observation completes before making a new observation.  When an observation does not complete, I would like to hold that observation within the buffer until the next observation can be added to the end.

   

data _test_ ;

     infile "C:\Desktop\sample_issues.CSV" 

           dlm=','

           dsd

           missover

           lrecl=30000

           firstobs = 2

           sharebuffers

          ;

  

     do until (comma > 5);       /* 5 is an arbitrary number used with this sample file */

          /*infile to buffer */

         input         
         ;

  

          put _infile_ @;

  

          /*Count the number of commas in each observation within the buffer */

     
          comma = countw(_infile_,
',');

          /* If comma variable is less than 165, then current observation needs to be held in buffer

                     until next obs is added to end */

 

          if comma < 5 then

 

               put @ ;

 

          else put _infile_;

            

               /* put _infile_@;*/

       end;

  

     drop comma;

  

run;

Attachment

Accepted Solutions
Solution
‎06-26-2015 06:44 PM
Super User
Super User
Posts: 6,500

Re: Special Carriages

I believe that you can only treat _INFILE_ like a variable if the LRECL of the records is below a certain value like 32K.

You could convert to program to read the lines character by character instead using RECFM=N.

filename copy temp;

data _null_;

  infile 'c:\downloads\sample_issues.csv' recfm=n ;

  file copy recfm=n ;

  input x $char1. ;

  quotes+(x='"');

  if mod(quotes,2) and (x='0D'x or x='0A'x) then put '|';

  else put x $char1.;

run;

data _null_;

  infile copy dsd dlm=',' truncover termstr=CRLF;

  length x1-x6 $200 ;

  input x1-x6;

  put (x1-x6) (=/);

run;

View solution in original post


All Replies
Super User
Super User
Posts: 7,402

Re: Special Carriages

Hi,

See here:

Super User
Super User
Posts: 6,500

Re: Special Carriages

The file you posted appears to only have embedded linefeeds.

So you can read it directly by just using the TERMSTR=CRLF option on the INFILE statement.

New Contributor
Posts: 4

Re: Special Carriages

Tom,

I agree the file contains embedded linefeeds that are corrected by TERMSTR, but the second observation "PROOF OF RETURN RECEIVED REFI" contains a linefeed not picked up by the TERMSTR=CRLF option. I have included the code and output I receive below.

data _test_ ;


infile "C:\Desktop\sample_issues.CSV" 

     dlm=','

     dsd

     termstr=crlf

     missover

     lrecl=30000

     firstobs = 2

     sharebuffers

     ;

length

     Name $10

     ID 8

     Comment $30

     Type $30

     Class 3

     Message $20

     ;

input

     Name $

     ID

     Comment $

     Type $

     Class $

     Message $

     ;

informat

     Name $10.

     ID 8.

     Comment $30.

     Type $30.

     Class 3.

     Message $20.

     ;

run;

Name          ID               Comment                                             Type                                             Class          Message

adfjlkj          515511      aldfjd here                                             100                                               55               aldfjd here

lmnop"        5465          "PROOF OF RETURN RECEIVED                                                            .

REFI"          52              7                                                           hhhttt REFI                                    .

abc              244187      jfa;kj                                                     PLEASE NOTE (35 NOT "53")"    12               wjlkj

kjklwj           266121      jkjoej                                                    250                                                75               jfk;ajs


Super User
Super User
Posts: 6,500

Re: Special Carriages

So you do need to pre-process the file to get rid of the offending CR+LF combinations.

The basic method is to remove or replace the ones that appear inside of quotes.  Here is one method.  I am replacing them with '|', but you could replace with '0A'x if you want.

filename copy temp;

data _null_;

  infile 'c:\downloads\sample_issues.csv'  ;

  file copy ;

  input ;

  put _infile_ @;

  q + lengthn(compress(_infile_,'"','k')) ;

  if 0=mod(q,2) then put;

  else put '|' @;

run;

data _null_;

  infile copy dsd dlm=',' truncover termstr=CRLF;

  length x1-x6 $200 ;

  input x1-x6;

  put (x1-x6) (=/);

run;

New Contributor
Posts: 4

Re: Special Carriages

Tom,

Your above code solved our issue with the sample file; however, when I apply the code to our full file there appears to be additional returns.  The master file contains 262 fields, 250,000+ rows, and is well under the maximum string length.  I applied your code to a smaller file with only 16 fields and 6 records(changing the first infile statement and length, input, and put statements in the second data step to 16).  The code corrected our problem, but when applied to all 262 fields for 6 records and changing the length to $20000 I get linefeeds where previously there weren't any.

Solution
‎06-26-2015 06:44 PM
Super User
Super User
Posts: 6,500

Re: Special Carriages

I believe that you can only treat _INFILE_ like a variable if the LRECL of the records is below a certain value like 32K.

You could convert to program to read the lines character by character instead using RECFM=N.

filename copy temp;

data _null_;

  infile 'c:\downloads\sample_issues.csv' recfm=n ;

  file copy recfm=n ;

  input x $char1. ;

  quotes+(x='"');

  if mod(quotes,2) and (x='0D'x or x='0A'x) then put '|';

  else put x $char1.;

run;

data _null_;

  infile copy dsd dlm=',' truncover termstr=CRLF;

  length x1-x6 $200 ;

  input x1-x6;

  put (x1-x6) (=/);

run;

New Contributor
Posts: 4

Re: Special Carriages

The code above appears to have corrected our carriage return issue; however, upon reading in the CSV file i noticed any numeric field that previously contained decimals are now read in with a space replacing the decimal.  Is this due to reading in as a byte?

Super User
Super User
Posts: 6,500

Re: Special Carriages

The use of $CHAR format should have prevented the periods from being converted to spaces.

data _null_;

   input @1 period $char1. @1 space $1. ;

   put (period space) (=) '_infile_=' _infile_ ;

cards;

.

1

A

;;;;

PROC Star
Posts: 1,093

Re: Special Carriages

I think if you just add lrecl=32767 to the two "infile" stmts in Tom's first code sample it will work.

Respected Advisor
Posts: 3,894

Re: Special Carriages

Is this something you want in the end to implement using SAS DI Studio?

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 662 views
  • 6 likes
  • 5 in conversation