BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
angel17th
Fluorite | Level 6

Hi,

 

I would like to ask for your help regarding the issue on subject. I already tried the solution provided in the https://communities.sas.com/t5/SAS-Programming/Read-in-raw-pipe-enclosed-data-separated-by-commas-SA... solved by  @Tom but the records exceeding the 257 length are truncated causing for misaligned imported records. I've tried changing the lrecl value from 1 to 32767 and the char1 to char32767 but it still doesn't solved the issue with truncation.

 

Another issue with the raw data is that the empty records are not pipe enclosed. Hope that someone can help me with this.

 

Thanks and advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

SAS is highly backward compatible and if your code works in 9.3 then later SAS versions should return the same result. Normally such differences are caused by environmental differences like different options settings and the like. 

 

In my environment option lrecl is set to 32767. If in your env the value is lower than the length of the string you want to write then the default (flowover) will write the not fitting bit to the next line. 

 

To test the theory you could define lrecl for your output file (value at least as long as the max string length you want to write onto a single line).

filename fixed "&target" lrecl=32767;

 

View solution in original post

10 REPLIES 10
Tom
Super User Tom
Super User

Why would records longer than 257 bytes be truncated?

If you cannot supply some example lines from the file then perform your own analysis of the file to see if it is well formed.

For example if you just read in the file without trying to save it you should see the record count and the min and max record length.

data _null_;
  infile 'myfile.csv' lrecl=1000000 ;
  input;
run;

Example:

50   data _null_;
51     infile csv lrecl=1000000;
52     input;
53   run;

NOTE: The infile CSV is:
      (system-specific pathname),
      (system-specific file attributes)

NOTE: 20 records were read from the infile (system-specific pathname).
      The minimum record length was 17.
      The maximum record length was 26.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
angel17th
Fluorite | Level 6

Hi @Tom ,

 

Thank you for your response, and apologies for the lack of details. Here is the sample data I'm reading:

|11081990|,|41|,|X04|,|N-XYXYXYXY/SAMPLE DUMMY DATA TO T|,|05311999|,|1|,|08-JAN-2001 04.28.19.546000 AM|,|1|,|29-APR-2001 04.55.35.938000 PM|,|08182001|,,,,
|09201978|,|42|,|Y05|,|N-XYXYXY/SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF IMPORTED RECORDS DUE TO THE LONG DATA SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF IMPORTED RECORDS DUE TO THE LONG DATA SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF |,|04211988|,|1|,|10-MAY-1989 05.28.30.550000 AM|,|1|,|20-JUN-1989 03.52.35.942000 PM|,|03031989|,,,,
|12031985|,|43|,|Z06|,|K-XYXYXYXYXYXYXYXY/ SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF IMPORTED RECORDS DUE TO THE L|,|09131988|,|1|,|19-SEP-1988 15.28.19.552000 AM|,|1|,|02-OCT-1988 13.44.15.944000 PM|,|03191988|,,,,

If I use the your modified code below:

filename original "IMPORT_DATA.csv";
filename fixed "FIXED_IMPORT_DATA.csv";
data _null_;
  infile original lrecl=1 recfm=f ;
  file fixed termstr=crlf;
  input char $char1.;
  numpipe + (char='|');
  if char='0D'x then return;
  else if char='0A'x then do;
    if mod(numpipe,2)=1 then put '|' @;
    else do;
       numpipe=0;
       put;
    end;
  end;
  else if char='|' then put '"' @;
  else if char='"' then put '""' @;
  else put char $char1. @;
run;

The FIXED_IMPORT_DATA.CSV will be created with the following records:

"11081990","41","X04","N-XYXYXYXY/SAMPLE DUMMY DATA TO T","05311999","1","08-JAN-2001 04.28.19.546000 AM","1","29-APR-2001 04.55.35.938000 PM","08182001",,,,
"09201978","42","Y05","N-XYXYXY/SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF IMPORTED RECORDS DUE TO THE LONG DATA SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF IMPORTED RECORDS DUE TO THE LONG DATA SAMPLE DUMMY DATA TO TEST THE TRUNCATION OF ","04211988","1","10-MAY-1989 05.28.30.550000 AM"
"1","02-OCT-1988 13.44.15.944000 PM","03191988",,,,

You may notice that the second record was truncated and the third record was changed.

 

Patrick
Opal | Level 21

@angel17th Your code works as it should in my Windows environment.

Are you sure this is not just a display issue? When I opened the generated file with Notepad++ and Word Wrapping selected I've seen exactly the same as you shared. But even though the 2nd record gets displayed on two lines it's still a single record (one CRLF). 

Patrick_0-1714624868095.png

And here some code to compare source with target that also confirms that things appear to work as desired.

%let source=c:\temp\IMPORT_DATA.csv;
%let target=c:\temp\FIXED_IMPORT_DATA.csv;

filename original "&source";
filename fixed "&target";
data _null_;
  infile original lrecl=1 recfm=f ;
  file fixed termstr=crlf;
  input char $char1.;
  numpipe + (char='|');
  if char='0D'x then return;
  else if char='0A'x then do;
    if mod(numpipe,2)=1 then put '|' @;
    else do;
       numpipe=0;
       put;
    end;
  end;
  else if char='|' then put '"' @;
  else if char='"' then put '""' @;
  else put char $char1. @;
run;

data source;
  infile "&source" truncover;
  input line:$1000.;
run;
data target;
  infile "&target" truncover;
  input  line:$1000.;
  line=translate(line,'|','"');
run;

proc compare data=source comp=target;
run;

Patrick_0-1714625726593.png

 

angel17th
Fluorite | Level 6

Hi @Patrick,

 

I've tried to import the three dummy records to SAS but it outputted four rows, and it seems that it cuts the 2nd record and the remaining parts of that record were inserted to the 3rd row. test csv.png

 

Please note that I'm using the SAS9 9.3 version, maybe there are some codes that are not compatible?

Patrick
Opal | Level 21

SAS is highly backward compatible and if your code works in 9.3 then later SAS versions should return the same result. Normally such differences are caused by environmental differences like different options settings and the like. 

 

In my environment option lrecl is set to 32767. If in your env the value is lower than the length of the string you want to write then the default (flowover) will write the not fitting bit to the next line. 

 

To test the theory you could define lrecl for your output file (value at least as long as the max string length you want to write onto a single line).

filename fixed "&target" lrecl=32767;

 

angel17th
Fluorite | Level 6

Hi Patrick,

 

Your suggestion worked! Thank you so much! I changed the lrecl for the output file and also added the lrecl=32767 option when importing it to the SAS dataset. 

Patrick
Opal | Level 21

@angel17th 

I believe you could simplify your current code using syntax like below.

data _null_;
  infile original;
  file fixed termstr=crlf;
  input;
  _infile_=tranwrd(_infile_,'"','""');
  _infile_=tranwrd(_infile_,'|','"');
  put _infile_;
run;
angel17th
Fluorite | Level 6

Hi @Patrick,

 

When I tested with your simplified code, I got some warnings about invalid data for the Date format columns. I read the columns first as DATETIME then format them as DATEAMPM. 

 

Sample data of the date format columns that I got warnings from:

01-JUN-2001 10.16.09.000000 AM
Patrick
Opal | Level 21

Are you changing the source file for the sole person of import into a SAS table? If that's the case then below code to directly read your source file into a SAS table.

%let source=c:\temp\IMPORT_DATA.csv;

data work.out;
  infile "&source"
    lrecl=500
    dlm=','
    truncover
    dsd;

  input @;
  _infile_=compress(_infile_,"|");

  input
    var_1  : mmddyy8.
    var_2  : best32.
    var_3  : $3.
    var_4  : $300.
    var_5  : mmddyy8.
    var_6  : best32.
    var_7  : datetime30.
    var_8  : best32.
    var_9  : datetime30.
    var_10 : mmddyy8.
    ;

  format
    var_1    date9.
    var_5    date9.
    var_7    datetime30.6
    var_9    datetime30.6
    var_10   date9.
    ;
run;
angel17th
Fluorite | Level 6

Hi @Patrick,

 

Yes I'm reading the CSV to a SAS table, however, that data of the CSV is a pipe enclosed with a comma delimiter. That is why I initially referenced the solution from Tom when reading the raw CSV. I added replaced the compress with the following code and it also worked:

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 425 views
  • 3 likes
  • 3 in conversation