I need to import a pipe delimited file in which the character columns are within quotes (" "). However some of these columns contain a Carriage Return (probably when enter pressed while filling in the data in same cell). Hence the data is looking something like:
Var1 | "Var2" | Var3
1 | "Hi. This contains a break
here" | 105
2 | "Another break
here and
here" | 109
I am using SAS 9.2. Can someone help me out in order to resolve the issue and import it into SAS as :
Var1 | "Var2" | Var3
1 | "Hi. This contains a break here" | 105
2 | "Another break here and here" | 109
Thanks.
Gaurav
Thanks for suggesting your own solution. I see two possibly minor drawbacks:
Alternatively, you could first find out what type of line break really occurs within the quoted strings. Just open the text file with a hex editor or read (relevant parts of) it with SAS using INFILE with RECFM=F and display the contents in $HEXw. format.
Example:
data _null_;
length string $32;
infile cdata recfm=f lrecl=32 truncover obs=10;
input string $char32.;
put string $hex64.;
run;
You wrote that @data_null__'s promising suggestion "didn't work out." This would be the case if the line breaks in your quoted strings were CRLF ('0D0A'x) characters.
In this case you could use the little program from http://support.sas.com/techsup/technote/ts673.pdf, p. 6 (which is similar to yours, but targeted at LF characters) to overwrite the LFs ('0A'x) within quoted strings with blanks. (To avoid overwriting the original text file, you could, of course, use file outdata ... as in your approach.)
You could then read the modified file as usual, remove the remaining CR ('0D'x) characters and compress multiple blanks (from multiple line breaks as in your sample data) to single blanks (unless there are other multiple blanks which are worth preserving):
data test;
length ID_IYXZ ID_LMNO 8 TX_ABCD $256; /* please adapt length as appropriate */
infile outdata dlm='|' dsd firstobs=2;
input ID_IYXZ ID_LMNO TX_ABCD :$quote.;
TX_ABCD=compbl(compress(TX_ABCD, '0D'x));
run;
It is probably linefeed 0Ax. Try using INFILE statement option TERMSTR=CRLF. Assuming your records are DOS based.
Thanks for replying. I already tried the option but it didn't work out.
Is there any other alternative available?
Please post a sample of your file.
I found this as a working solution.
Working on the raw csv in binary format and removing special characters, then simply proc import the new csv.
data _null_;
infile cdata lrecl=1 recfm=n end=eof;
file outdata lrecl=1000000;
nq=0;
do until (eof or (char='0A'X and mod(nq,2)=0));
input char $char1.;
nq=nq+(char='"') ;
if char not in ('0D'x,'0A'x) then do;
char1 = prxchange('s/[^\x20-\x7E\x0A\x0D]//', -1, char);
if missing(char)=1 or missing(char1)=0 then put char1 $char1. @ ;
end;
end;
put ;
run;
Can someone confirm this and please explain any drawbacks/problems this might cause?
Thanks for suggesting your own solution. I see two possibly minor drawbacks:
Alternatively, you could first find out what type of line break really occurs within the quoted strings. Just open the text file with a hex editor or read (relevant parts of) it with SAS using INFILE with RECFM=F and display the contents in $HEXw. format.
Example:
data _null_;
length string $32;
infile cdata recfm=f lrecl=32 truncover obs=10;
input string $char32.;
put string $hex64.;
run;
You wrote that @data_null__'s promising suggestion "didn't work out." This would be the case if the line breaks in your quoted strings were CRLF ('0D0A'x) characters.
In this case you could use the little program from http://support.sas.com/techsup/technote/ts673.pdf, p. 6 (which is similar to yours, but targeted at LF characters) to overwrite the LFs ('0A'x) within quoted strings with blanks. (To avoid overwriting the original text file, you could, of course, use file outdata ... as in your approach.)
You could then read the modified file as usual, remove the remaining CR ('0D'x) characters and compress multiple blanks (from multiple line breaks as in your sample data) to single blanks (unless there are other multiple blanks which are worth preserving):
data test;
length ID_IYXZ ID_LMNO 8 TX_ABCD $256; /* please adapt length as appropriate */
infile outdata dlm='|' dsd firstobs=2;
input ID_IYXZ ID_LMNO TX_ABCD :$quote.;
TX_ABCD=compbl(compress(TX_ABCD, '0D'x));
run;
This could give you a start . You need change the code according to your real data. Especiall which one is the start of a row .
data have;
infile '/folders/myfolders/x.txt' dsd delimiter='|' recfm=n ;
input (x) (: $40.) @@ ;
x1=scan(x,1,'0D0A'x);
x2=scan(x,2,'0D0A'x);
if notdigit(strip(x2))=0 and not missing(x2) then do;
x=x1;output;
group+1;x=x2;output;
end;
else output;
drop x1 x2;
run;
proc transpose data=have out=want(drop=_:);
by group;
var x;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.