I'm trying to import clickstream data but i have a field that has multiple commas. i know that the next field always starts with double quotes. maybe there is a scan solution for this?
Here is what i'm using to import:
data Sample ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'E:\Jeff B\Open\ClickStream\Sample.txt' delimiter = ',' MISSOVER DSD lrecl=32767 ;
informat Date date11.;
informat DateTime time. ;
informat IP $15. ;
informat DOM $50. ;
informat DOM_SUB $255. ;
informat DEV $255.;
informat DUR best12.;
format Date date9. ;
format DateTime time18.3;
format IP $15. ;
format DOM $50. ;
format DOM_SUB $255. ;
format DEV $255.;
format DUR best12.;
input
Date
DateTime
IP $
DOM $
DOM_SUB $
DEV $
DUR;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
*please note that record #3 and #7 are not being read in correctly;
Thanks -- Sample data attached.
I've attached the full data set.
You seem to have 7 fields. The 5th one seems to have unprotected commas. The 6th one seems to always start with a space and then a double quote. The 7th looks like a number.
Here is one way to read it. You can adjust the variable names, types and lengths to match your actual values.
data sample;
infile '~/temp/sample.txt' termstr=crlf lrecl=10000 dsd truncover col=cc ;
length x1-x7 $2000 ;
input x1-x4 @ ;
start=cc ;
input @' "' @;
end=cc-1;
x5 = substr(_infile_,start,end-start-2);
input @end x6 x7;
if _n_ in (3,7) then put (_n_ x1-x7) (=/);
run;
You seem to have 7 fields. The 5th one seems to have unprotected commas. The 6th one seems to always start with a space and then a double quote. The 7th looks like a number.
Here is one way to read it. You can adjust the variable names, types and lengths to match your actual values.
data sample;
infile '~/temp/sample.txt' termstr=crlf lrecl=10000 dsd truncover col=cc ;
length x1-x7 $2000 ;
input x1-x4 @ ;
start=cc ;
input @' "' @;
end=cc-1;
x5 = substr(_infile_,start,end-start-2);
input @end x6 x7;
if _n_ in (3,7) then put (_n_ x1-x7) (=/);
run;
works great - Thanks tom!
Tom,
When I try this on the live dataset i'm only able to import the first row. The log file indicates that 'one or more rows might be getting truncated. Here is what the log file looks like:
123 data sample;
124 infile 'E:\Jeff B\Open\ClickStream\Data/centris-11252013.txt'
125 termstr=crlf lrecl=10000 dsd truncover col=cc ;
126 length x1-x7 $2000 ;
127 input x1-x4 @ ;
128 start=cc ;
129 input @' "' @;
130 end=cc-1;
131 x5 = substr(_infile_,start,end-start-2);
132 input @end x6 x7;
133 if _n_ in (3,7) then put (_n_ x1-x7) (=/);
134 run;
NOTE: The infile 'E:\Jeff B\Open\ClickStream\Data/centris-11252013.txt' is:
Filename=E:\Jeff B\Open\ClickStream\Data\centris-11252013.txt,
RECFM=V,LRECL=10000,File Size (bytes)=9123388,
Last Modified=26Nov2013:06:41:42,
Create Time=26Nov2013:13:34:26
NOTE: 1 record was read from the infile 'E:\Jeff B\Open\ClickStream\Data/centris-11252013.txt'.
The minimum record length was 10000.
The maximum record length was 10000.
One or more lines were truncated.
NOTE: The data set WORK.SAMPLE has 1 observations and 9 variables.
NOTE: DATA statement used (Total process time):
real time 0.13 seconds
cpu time 0.14 seconds
any thoughts on what might be causing this?
Thanks!
Most likely the end of line characters are not carriage return and line feed like you coded in your INFILE statement.
You might also need to up the LRECL.
The simple way to check is to read the and let SAS show you want is in the first record.
data _null_;
infile 'E:\Jeff B\Open\ClickStream\Data/centris-11252013.txt' termstr=crlf lrecl=100000 ;
input;
list ;
stop;
run;
When I do that on my Unix machine I see that SAS is showing that the last character has hexadecimal code value of '0A'x which is a line feed.
You can eliminate the TERMSTR option or set it to LF and try again.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.