DATA Step, Macro, Functions and more

Embedded Carriage Returns in a Pipe Delimited CSV File.

Reply
New Contributor
Posts: 4

Embedded Carriage Returns in a Pipe Delimited CSV File.

Hi Guys,

I have a CSV which is pipe delimited.and doesn't include any text identifiers (the individual strings are not surrounded by double inverted commas or anything else for that matter).  Unfortunately some of the text strings contain embedded carriage returns that I obviously need to remove in order to read the file in effectively.

I have searched the internet high and low for a solution and attempted to use what I have located without success.  The solution with the most potential seemed to be a datastep with both infile and file statements using the sharebuffers option.  For some unexplained reason this method simply kills my E.G session (which I know sounds crazy, but that is what happens).  The following code is a reasonable representation (although an over simplification) of what how the data would look if I imported the CSV file.

DATA STUFF;

LENGTH TEXT $1000;

INPUT TEXT $;

DATALINES;

VAR1|VAR2|VAR3|VAR4|VAR5|VAR6|VAR7|VAR8|VAR9|VAR10|VAR11|VAR12

1|2|3|4

|5|6|7|8|

9|10|11|12 ;

RUN;

To get around this I thought that I could read the first line of the text file (containing the variable names which never have embedded carriage returns) into a single variable and identify the number of pipes (11).  I would then do the same for the remainder of the file and where the number of pipes are less than 11 then append the next record (by lagging the first occurance) to the second occurance and then delete any records that don't have 11 pipes.  This works perfectly until you come across an instance where there are multiple carriage returns for a single record.

Can anyone offer a suggestion on how I could continue this process until the number of pipes equals 11?

I am open to other solutions, but this is the best I could come up with.

Any help would be greatly appreciated.

Regards,

Scott

Respected Advisor
Posts: 3,799

Re: Embedded Carriage Returns in a Pipe Delimited CSV File.

Posted in reply to Scottcom5

You don't mention the OS you are using.  That would be helpful.

I think the TERMSTR INFILE statement option will solve this problem.  Try TERMSTR=CRLF.

If that doesn't work be more specific about the offending character (is is '0A'x?) and your OS.

Super User
Posts: 11,343

Re: Embedded Carriage Returns in a Pipe Delimited CSV File.

Posted in reply to data_null__

Depending on how messy your data is you might also try the FLOWOVER option on the INFILE statement. This option attempts to continue to read data from the next line when an unexpected end of line occurs.

New Contributor
Posts: 4

Re: Embedded Carriage Returns in a Pipe Delimited CSV File.

Hi BallardW,

I did try the Flowover option, but was getting some unexpected results.

Thank you for your suggestion though..

Regards,

Scott

Super User
Posts: 11,343

Re: Embedded Carriage Returns in a Pipe Delimited CSV File.

Posted in reply to Scottcom5

Perhaps you could show an actual copy of the infile and input statements you are using.

My comment about messy data basically meant that if you have empty fields there's likely to be problems keeping the fields synched.

Instead of LAG you might be better off looking at the pointer control input and @ and @@ and parsing data.

It sounds like you have multiple files with different variables from your comment about reading the first line. Will that give you enough information to know whether the variables are text or numeric? Or will you assume all are text and covert any numeric later?


New Contributor
Posts: 4

Re: Embedded Carriage Returns in a Pipe Delimited CSV File.

Posted in reply to data_null__

Hi data_null_,

I have attempted to use TERMSTR but with no joy.  From the documentation I have read this is only useful when processing a file created in Unix using Windows (and vice versa) as it specifies the end-of-line character for the specified file, which is a CR ('0D'x in this example.

We are using Windows Server 2008 R2 if that helps.

Regards,

Scott

Respected Advisor
Posts: 3,799

Re: Embedded Carriage Returns in a Pipe Delimited CSV File.

Posted in reply to Scottcom5

Attache the file or a few records.

New Contributor
Posts: 4

Re: Embedded Carriage Returns in a Pipe Delimited CSV File.

Posted in reply to data_null__

Sorry, I can't provide a sample as the data is confidential.

Respected Advisor
Posts: 3,799

Re: Embedded Carriage Returns in a Pipe Delimited CSV File.

Posted in reply to Scottcom5

I cannot reproduce your problem with imbeded CR '0D'x in the records. Using SAS 9.2 for winders.  I can produce the problem with LF '0A'x and it can be fixed with TERMSTR.  If you are using SAS 9.1 I recall that TERMSTR did not work.

1981  filename FT44F001 temp;

1982  data _null_;

1983     file FT44F001;

1984     put 'VAR1|VAR2|VAR3|VAR4|VAR5|VAR6|VAR7|VAR8|VAR9|VAR10|VAR11|VAR12';

1985     put '1|2|3|4' '0a'x '|5|6|7|8|' '0a'x '9|10|11|12';

1986     put '1|2|3|4|5|6|7|8|9|10|11|12';

1987     put '1|2|3|4' '0a'x '|5|6|7|8|' '0a'x '9|10|11|12';

1988     run;

NOTE: The file FT44F001 is:

      (system-specific pathname),

      (system-specific file attributes)

NOTE: 4 records were written to the file (system-specific pathname).

      The minimum record length was 26.

      The maximum record length was 62.

NOTE: DATA statement used (Total process time):

      real time           0.06 seconds

      cpu time            0.01 seconds

1989  data _null_;

1990     infile FT44F001;

1991     input;

1992     list;

1993     run;

NOTE: The infile FT44F001 is:

      (system-specific pathname),

      (system-specific file attributes)

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0

1         VAR1|VAR2|VAR3|VAR4|VAR5|VAR6|VAR7|VAR8|VAR9|VAR10|VAR11|VAR12 62

2         1|2|3|4 7

3         |5|6|7|8| 9

4         9|10|11|12 10

5         1|2|3|4|5|6|7|8|9|10|11|12 26

6         1|2|3|4 7

7         |5|6|7|8| 9

8         9|10|11|12 10

NOTE: 8 records were read from the infile (system-specific pathname).

      The minimum record length was 7.

      The maximum record length was 62.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

1994

1995  data _null_;

1996     infile FT44F001 termstr=crlf;

1997     input;

1998     list;

1999     run;

NOTE: The infile FT44F001 is:

      (system-specific pathname),

      (system-specific file attributes)

RULE:     ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0

1         VAR1|VAR2|VAR3|VAR4|VAR5|VAR6|VAR7|VAR8|VAR9|VAR10|VAR11|VAR12 62

2   CHAR  1|2|3|4.|5|6|7|8|.9|10|11|12 28

    ZONE  3737373073737373703733733733

    NUMR  1C2C3C4AC5C6C7C8CA9C10C11C12

3         1|2|3|4|5|6|7|8|9|10|11|12 26

4   CHAR  1|2|3|4.|5|6|7|8|.9|10|11|12 28

    ZONE  3737373073737373703733733733

    NUMR  1C2C3C4AC5C6C7C8CA9C10C11C12

NOTE: 4 records were read from the infile (system-specific pathname).

      The minimum record length was 26.

      The maximum record length was 62.

NOTE: DATA statement used (Total process time):

      real time           0.04 seconds

      cpu time            0.01 seconds

Super User
Posts: 10,041

Re: Embedded Carriage Returns in a Pipe Delimited CSV File.

Posted in reply to Scottcom5

Or Can you make some dummy data to appear your problem again.

And Best is posting the output you want too.

Regular Contributor
Posts: 151

Re: Embedded Carriage Returns in a Pipe Delimited CSV File.

Posted in reply to Scottcom5

This may be helpful:

https:/communities.sas.com/thread/34015

Oleg

Ask a Question
Discussion stats
  • 10 replies
  • 2021 views
  • 6 likes
  • 5 in conversation