Hi Experts,
I have a table where I have 3 variables of length 32,767 each.
I am reading data from a CSV file. Some times we expect these fields to contain 32,767 characters.
In that case the CSV file length goes above 33,000.
So in this case SAS truncates the data or it does not behave as it is supposed to.
1. Can we increase the logical record length to handle more data in input buffer?
2. Is there more elegant solution to this?
Code:
*replace NewLines,Line Feed,Line Break with space if any and create a temp file;
filename workfile "\\vendorFile\tmp.csv";
data _null_ ;
if eof then put 'NOTE: Records read=' newn 'Records with missing quotes=' missq ;
infile "\\eDiary_small.csv" lrecl=32767 end=eof ;
file workfile ;
nq=0;
do until (mod(nq,2)=0 or eof );
input;
newn+1;
nq = nq + countc(_infile_,'"');
put _infile_ @;
if mod(nq,2) then do;
missq+1;
put " " @;
end;
end;
put;
run;
*Read the File to create SAS dataset;
data sas_vout.eDiary;
infile workfile dlm="," dsd firstobs=2 lrecl=32767;
input PATIENT_S_DUE_TIME_C:$24. PATIENT_S_TIMEZONE__C:$255. PATIENT_ID_SUBJECT_ID__C:$255.
VTD2_STUDY_NICKNAME__C:$255. VTR4_CROSS_REFERENCE_SUBJECT_ID_:$100. ANSWER_ID:$18. DIARY_ID:$18. NAME:$80. VTD1_SURVEY__C:$18.
VTD1_ANSWER__C:$32767. VTR2_ENGLISH_ANSWER__C:$32767. VTR2_QUESTION_CONTENT__C:$32767. CREATEDDATE:$24.
VTD1_COMPLETION_TIME__C:$24. VTD1_DATE_AVAILABLE__C:$24. VTD1_DUE_DATE__C:$24. VTD2_PATIENT_S_DUE_TIME__C:$24.
VTD2_PATIENT_S_NUMERIC_TIMEZONE_:$255. VTD1_STATUS__C:$255. VTR2_DOC_LANGUAGE__C:$5. VTD1_REMINDER_DUE_DATE__C:$24.
VTD1_REMINDER_WINDOW__C:$18. VTD1_NUMBER_OF_ANSWERS__C:$18. VTD1_NUMBER_OF_SCORED_ANSWERS__C:$18. VTR2_REVIEWER_USER__C:$16.
VTR2_NUMBER_OF_ANSWERED_ANSWERS_:$18. VTR2_TOTAL_SCORE__C:$18. VTR4_PERCENTAGE_COMPLETED__C:$3.;
run;
Test Records are Huge, so I have attached it. File size less that 200kb
Thank you in advance.
@Ksharp We get the following error.
ERROR: The LRECL / LINESIZE for infile '"\\eDiary_small.csv"' exceeds the maximum allowable length for an _INFILE_ or _INFILE_= variable (32,767). The DATA STEP will not be executed.
A single variable cannot be longer than the 32k limit either, ie the _infile_ cannot be used.
It's likely erroring out in your first data step but not the second.
What are you fixing with the first step? It looks like you read it in and write it back out, and then read it back in again. Maybe there's a way to avoid that?
I see some things that make me question how well your source file may be documented. You have a variable PATIENT_S_TIMEZONE__C as $255. I don't think I have ever seen a timezone indicator that ran more than 6 characters. Are getting text like "Grenwich Mean Time" or something for that? There are several of the variables with $255 that I wonder if the values actually come close to that length.
Plus variables whose name include such things a "number of answers" and are character???
You are reading time and date values into character variables as well. I normally try to make sure these are actually SAS time, date or datetime variables as appropriate as early as practical with appropriate formats.
If your use of _infile_ was to strip out newline/linefeed characters in the middle of long text strings, such as your "answer" variables, you might go the source that collects the data and see if those can be addressed when exported or the CSV file is created.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.