Hi,
I would like to ask for your help regarding the issue on subject. I already tried the solution provided in the https://communities.sas.com/t5/SAS-Programming/Read-in-raw-pipe-enclosed-data-separated-by-commas-SA... solved by @Tom but the records exceeding the 257 length are truncated causing for misaligned imported records. I've tried changing the lrecl value from 1 to 32767 and the char1 to char32767 but it still doesn't solved the issue with truncation.
Another issue with the raw data is that the empty records are not pipe enclosed. Hope that someone can help me with this.
Thanks and advance!
SAS is highly backward compatible and if your code works in 9.3 then later SAS versions should return the same result. Normally such differences are caused by environmental differences like different options settings and the like.
In my environment option lrecl is set to 32767. If in your env the value is lower than the length of the string you want to write then the default (flowover) will write the not fitting bit to the next line.
To test the theory you could define lrecl for your output file (value at least as long as the max string length you want to write onto a single line).
filename fixed "&target" lrecl=32767;
Why would records longer than 257 bytes be truncated?
If you cannot supply some example lines from the file then perform your own analysis of the file to see if it is well formed.
For example if you just read in the file without trying to save it you should see the record count and the min and max record length.
data _null_;
infile 'myfile.csv' lrecl=1000000 ;
input;
run;
Example:
50 data _null_; 51 infile csv lrecl=1000000; 52 input; 53 run; NOTE: The infile CSV is: (system-specific pathname), (system-specific file attributes) NOTE: 20 records were read from the infile (system-specific pathname). The minimum record length was 17. The maximum record length was 26. NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Hi @Tom ,
Thank you for your response, and apologies for the lack of details. Here is the sample data I'm reading:
|11081990|,|41|,|X04|,|N-XYXYXYXY/SAMPLE DUMMY DATA TO T|,|05311999|,|1|,|08-JAN-2001 04.28.19.546000 AM|,|1|,|29-APR-2001 04.55.35.938000 PM|,|08182001|,,,,
|09201978|,|42|,|Y05|,|N-XYXYXY/SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF IMPORTED RECORDS DUE TO THE LONG DATA SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF IMPORTED RECORDS DUE TO THE LONG DATA SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF |,|04211988|,|1|,|10-MAY-1989 05.28.30.550000 AM|,|1|,|20-JUN-1989 03.52.35.942000 PM|,|03031989|,,,,
|12031985|,|43|,|Z06|,|K-XYXYXYXYXYXYXYXY/ SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF IMPORTED RECORDS DUE TO THE L|,|09131988|,|1|,|19-SEP-1988 15.28.19.552000 AM|,|1|,|02-OCT-1988 13.44.15.944000 PM|,|03191988|,,,,
If I use the your modified code below:
filename original "IMPORT_DATA.csv";
filename fixed "FIXED_IMPORT_DATA.csv";
data _null_;
infile original lrecl=1 recfm=f ;
file fixed termstr=crlf;
input char $char1.;
numpipe + (char='|');
if char='0D'x then return;
else if char='0A'x then do;
if mod(numpipe,2)=1 then put '|' @;
else do;
numpipe=0;
put;
end;
end;
else if char='|' then put '"' @;
else if char='"' then put '""' @;
else put char $char1. @;
run;
The FIXED_IMPORT_DATA.CSV will be created with the following records:
"11081990","41","X04","N-XYXYXYXY/SAMPLE DUMMY DATA TO T","05311999","1","08-JAN-2001 04.28.19.546000 AM","1","29-APR-2001 04.55.35.938000 PM","08182001",,,,
"09201978","42","Y05","N-XYXYXY/SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF IMPORTED RECORDS DUE TO THE LONG DATA SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF IMPORTED RECORDS DUE TO THE LONG DATA SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF ","04211988","1","10-MAY-1989 05.28.30.550000 AM"
"1","02-OCT-1988 13.44.15.944000 PM","03191988",,,,
You may notice that the second record was truncated and the third record was changed.
@angel17th Your code works as it should in my Windows environment.
Are you sure this is not just a display issue? When I opened the generated file with Notepad++ and Word Wrapping selected I've seen exactly the same as you shared. But even though the 2nd record gets displayed on two lines it's still a single record (one CRLF).
And here some code to compare source with target that also confirms that things appear to work as desired.
%let source=c:\temp\IMPORT_DATA.csv;
%let target=c:\temp\FIXED_IMPORT_DATA.csv;
filename original "&source";
filename fixed "&target";
data _null_;
infile original lrecl=1 recfm=f ;
file fixed termstr=crlf;
input char $char1.;
numpipe + (char='|');
if char='0D'x then return;
else if char='0A'x then do;
if mod(numpipe,2)=1 then put '|' @;
else do;
numpipe=0;
put;
end;
end;
else if char='|' then put '"' @;
else if char='"' then put '""' @;
else put char $char1. @;
run;
data source;
infile "&source" truncover;
input line:$1000.;
run;
data target;
infile "&target" truncover;
input line:$1000.;
line=translate(line,'|','"');
run;
proc compare data=source comp=target;
run;
Hi @Patrick,
I've tried to import the three dummy records to SAS but it outputted four rows, and it seems that it cuts the 2nd record and the remaining parts of that record were inserted to the 3rd row.
Please note that I'm using the SAS9 9.3 version, maybe there are some codes that are not compatible?
SAS is highly backward compatible and if your code works in 9.3 then later SAS versions should return the same result. Normally such differences are caused by environmental differences like different options settings and the like.
In my environment option lrecl is set to 32767. If in your env the value is lower than the length of the string you want to write then the default (flowover) will write the not fitting bit to the next line.
To test the theory you could define lrecl for your output file (value at least as long as the max string length you want to write onto a single line).
filename fixed "&target" lrecl=32767;
Hi Patrick,
Your suggestion worked! Thank you so much! I changed the lrecl for the output file and also added the lrecl=32767 option when importing it to the SAS dataset.
I believe you could simplify your current code using syntax like below.
data _null_;
infile original;
file fixed termstr=crlf;
input;
_infile_=tranwrd(_infile_,'"','""');
_infile_=tranwrd(_infile_,'|','"');
put _infile_;
run;
Hi @Patrick,
When I tested with your simplified code, I got some warnings about invalid data for the Date format columns. I read the columns first as DATETIME then format them as DATEAMPM.
Sample data of the date format columns that I got warnings from:
01-JUN-2001 10.16.09.000000 AM
Are you changing the source file for the sole person of import into a SAS table? If that's the case then below code to directly read your source file into a SAS table.
%let source=c:\temp\IMPORT_DATA.csv;
data work.out;
infile "&source"
lrecl=500
dlm=','
truncover
dsd;
input @;
_infile_=compress(_infile_,"|");
input
var_1 : mmddyy8.
var_2 : best32.
var_3 : $3.
var_4 : $300.
var_5 : mmddyy8.
var_6 : best32.
var_7 : datetime30.
var_8 : best32.
var_9 : datetime30.
var_10 : mmddyy8.
;
format
var_1 date9.
var_5 date9.
var_7 datetime30.6
var_9 datetime30.6
var_10 date9.
;
run;
Hi @Patrick,
Yes I'm reading the CSV to a SAS table, however, that data of the CSV is a pipe enclosed with a comma delimiter. That is why I initially referenced the solution from Tom when reading the raw CSV. I added replaced the compress with the following code and it also worked:
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.