- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I have an csv file with around 9 fields, one of this field (field 7) is an free text column in the source system.
I.E., user can type an message in it, so this field has carriage return value, special char, quotes, alt+enter.
Below you can see one record. Most of the records are without CR in the text field and is one row in the csv file. But some are with and appear as below.
how can i import it in a easy way (we are using standard SAS integration studio) in a file reader? or programming? is there an option which can help me?
Thanks in advance.
regards,
Ann
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If the lines are ended with CR+LF, which is normal for a DOS/Windows file, and the values have only CR or LF then the file should work fine if you just make sure to use the TERMSTR=CRLF option on the INFILE statement. If you want to use PROC IMPORT to guess how to read the file than make a fileref that points to the file so you can add the TERMSTR=CRLF option to the FILENAME statement.
If the embedded characters are exactly the same as the normal end of line characters then you will need to first preprocess the file to fix that. When the values are quoted that can be done by just counting how many quotes you have seen.
Here is macro that will do that for you: https://github.com/sasutils/macros/blob/master/replace_crlf.sas
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This works
filename CSV "~/aa.txt";
data WANT;
infile CSV dsd dlm=';' flowover termstr=crlf;
informat A B C $8.;
input A B C;
run;
for file
"aa";"aa
aa";"zz"
A | B | C |
---|---|---|
aa | aa aa | zz |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ah this in on Linux though. I suppose you're on Windows?
The EOL character is different.
Never mind.
You need to read one character at at a time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's an example of how you can process the data as you read it.
filename CSV "~/aa.txt";
data _null_;
file CSV;
put '"aa";"bb' '0d0a'x 'bb";"cc"' '0d0a'x '"xx";"yy";"zz"';
put '"11";"22";"33"';
run;
data WANT;
infile CSV recfm=n eof=EOF;%* Read the file in stream mode;
length A B C TMP $8 ; %* Declare variables;
retain A B TMP; %* Retain variables
input X $1. @@; %* Save one character of data;
if X='"' then do; %* Look for quote in data;
Q+1; %* Count quotes;
if Q=2 then do; %* Second quote: End of string;
VARNO+1; %* Count variables ;
if VARNO=1 then A=TMP; %* Save value into variable;
if VARNO=2 then B=TMP; %* Save value into variable;
if VARNO=3 then do; %* Last variable: Save record;
C=TMP; %* Save value into variable;
output; %* Save record;
VARNO=-1; %* Reset variable counter;
end;
else do; %* Not last variable: skip to next data character;
do until (X not in (';','"','0d'x,'0a'x));
input X $1. @@;
end;
TMP=X; %* Save first character of data;
end;
Q=1; %* Second quote processed, reset quote counter;
end;
end;
else
TMP=catt(TMP,X); %* Character is not a quote: Append it into value;
return; %* Go to top to read next character;
eof: %* EOF found;
stop; %* Stop the data step;
keep A B C;
run;
proc print noobs;
run;
A | B | C |
---|---|---|
aa | bb bb | cc |
xx | yy | zz |
11 | 22 | 33 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If the lines are ended with CR+LF, which is normal for a DOS/Windows file, and the values have only CR or LF then the file should work fine if you just make sure to use the TERMSTR=CRLF option on the INFILE statement. If you want to use PROC IMPORT to guess how to read the file than make a fileref that points to the file so you can add the TERMSTR=CRLF option to the FILENAME statement.
If the embedded characters are exactly the same as the normal end of line characters then you will need to first preprocess the file to fix that. When the values are quoted that can be done by just counting how many quotes you have seen.
Here is macro that will do that for you: https://github.com/sasutils/macros/blob/master/replace_crlf.sas
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
https://communities.sas.com/t5/SAS-Programming/Input-a-CSV-file-with-text-fields-that-contain-line-b...
https://communities.sas.com/t5/SAS-Programming/Proc-import-issue-where-the-data-is-both-comma-separa...
https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-properly-read-a-csv-file/m-p/855247