Help using Base SAS procedures

Importing CSV with multiple delimiters in one field

Accepted Solution Solved
Reply
Contributor
Posts: 62
Accepted Solution

Importing CSV with multiple delimiters in one field

I'm trying to import clickstream data but i have a field that has multiple commas. i know that the next field always starts with double quotes. maybe there is a scan solution for this?

Here is what i'm using to import:

data Sample                       ;

%let _EFIERR_ = 0; /* set the ERROR detection macro variable */

  infile 'E:\Jeff B\Open\ClickStream\Sample.txt' delimiter = ',' MISSOVER DSD lrecl=32767 ;

  informat Date  date11.;

  informat DateTime time. ;

  informat IP $15. ;

  informat DOM $50. ;

  informat DOM_SUB $255. ;

  informat DEV $255.;

  informat DUR best12.;

  format Date date9. ;

  format DateTime time18.3;

  format IP $15. ;

  format DOM $50. ;

  format DOM_SUB $255. ;

  format DEV $255.;

  format DUR best12.;

  input

  Date

  DateTime

  IP $

  DOM $

  DOM_SUB $

  DEV $

  DUR;

if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */

run;

*please note that record #3 and #7 are not being read in correctly;

Thanks -- Sample data attached.

I've attached the full data set.

Attachment
Attachment

Accepted Solutions
Solution
‎11-26-2013 03:39 PM
Super User
Super User
Posts: 6,500

Re: Importing CSV with multiple delimiters in one field

You seem to have 7 fields.  The 5th one seems to have unprotected commas. The 6th one seems to always start with a space and then a double quote.  The 7th looks like a number.

Here is one way to read it.  You can adjust the variable names, types and lengths to match your actual values.

data sample;

infile '~/temp/sample.txt' termstr=crlf lrecl=10000 dsd truncover col=cc ;

length x1-x7 $2000 ;

input x1-x4 @ ;

start=cc ;

input @' "' @;

end=cc-1;

x5 = substr(_infile_,start,end-start-2);

input @end x6 x7;

if _n_ in (3,7) then put (_n_ x1-x7) (=/);

run;

View solution in original post


All Replies
Solution
‎11-26-2013 03:39 PM
Super User
Super User
Posts: 6,500

Re: Importing CSV with multiple delimiters in one field

You seem to have 7 fields.  The 5th one seems to have unprotected commas. The 6th one seems to always start with a space and then a double quote.  The 7th looks like a number.

Here is one way to read it.  You can adjust the variable names, types and lengths to match your actual values.

data sample;

infile '~/temp/sample.txt' termstr=crlf lrecl=10000 dsd truncover col=cc ;

length x1-x7 $2000 ;

input x1-x4 @ ;

start=cc ;

input @' "' @;

end=cc-1;

x5 = substr(_infile_,start,end-start-2);

input @end x6 x7;

if _n_ in (3,7) then put (_n_ x1-x7) (=/);

run;

Contributor
Posts: 62

Re: Importing CSV with multiple delimiters in one field

works great - Thanks tom!

Contributor
Posts: 62

Re: Importing CSV with multiple delimiters in one field

Tom,

When I try this on the live dataset i'm only able to import the first row. The log file indicates that 'one or more rows might be getting truncated. Here is what the log file looks like:

123  data sample;

124      infile 'E:\Jeff B\Open\ClickStream\Data/centris-11252013.txt'

125      termstr=crlf lrecl=10000 dsd truncover col=cc ;

126      length x1-x7 $2000 ;

127          input x1-x4 @ ;

128              start=cc ;

129              input @' "' @;

130              end=cc-1;

131              x5 = substr(_infile_,start,end-start-2);

132          input @end x6 x7;

133      if _n_ in (3,7) then put (_n_ x1-x7) (=/);

134  run;

NOTE: The infile 'E:\Jeff B\Open\ClickStream\Data/centris-11252013.txt' is:

      Filename=E:\Jeff B\Open\ClickStream\Data\centris-11252013.txt,

      RECFM=V,LRECL=10000,File Size (bytes)=9123388,

      Last Modified=26Nov2013:06:41:42,

      Create Time=26Nov2013:13:34:26

NOTE: 1 record was read from the infile 'E:\Jeff B\Open\ClickStream\Data/centris-11252013.txt'.

      The minimum record length was 10000.

      The maximum record length was 10000.

      One or more lines were truncated.

NOTE: The data set WORK.SAMPLE has 1 observations and 9 variables.

NOTE: DATA statement used (Total process time):

      real time           0.13 seconds

      cpu time            0.14 seconds

any thoughts on what might be causing this?

Thanks!

Super User
Super User
Posts: 6,500

Re: Importing CSV with multiple delimiters in one field

Most likely the end of line characters are not carriage return and line feed like you coded in your INFILE statement.

You might also need to up the LRECL.

The simple way to check is to read the and let SAS show you want is in the first record.

data _null_;

  infile 'E:\Jeff B\Open\ClickStream\Data/centris-11252013.txt'  termstr=crlf lrecl=100000 ;

  input;

  list ;

  stop;

run;

When I do that on my Unix machine I see that SAS is showing that the last character has hexadecimal code value of '0A'x which is a line feed.

You can eliminate the TERMSTR option or set it to LF and try again.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 707 views
  • 0 likes
  • 2 in conversation