BookmarkSubscribeRSS Feed
emaguin
Quartz | Level 8

I'd like to understand some of the notes coming back from a proc import command.

I'm reading qualtrics data files, which are exported as csv files and use those characters as the extension.

I get this note: EFI will truncate records > 32767; your record length was 131068

I opened the file up in notepad++ and the longest line is 14914 characters. My understanding, per wikipedia, is that utf_8 is at most 4 bytes per character, so 57K bytes in round numbers. Can someone help understand the difference, or is Notepad++ misleading?

 

My import command is this: 

proc import
datafile='<input file string>' out=output file string replace; datarow=3; getnames=yes; guessingrows=max;

I read file line 1 as variable names, which they are; line 2 is the ~15K character item text line but i skip it; and line 3 is the first data line. Line 2 is useless to me. I could delete the line using Notepad++ or excel. However, can/does leaving to-be truncated but skipped line in the dataset have any adverse consequences?

 

I don't recall whether i've seen this or not but what is the maximum string length for a character variable and what happens if import encounters a string longer than that value?

Thanks, Gene Maguin

 

I'm posting this is Programming but i suppose its really Procedures. Does it matter? I kind of assume that those of you that reply do so to everything, regardless of community, you're knowledgeable about and care to reply to.

 

5 REPLIES 5
ballardw
Super User

Some programs may be a bit more generous about interpreting something as an end of line character than SAS Proc Import. So if your file has a mix of LF and CR/LF Notepad may be accepting both of them as end of lines.

 

If you are happy with the content when using datarow=3 then skipping the second line is okay. If your line 2 is something related to additional description of the data I would keep it around and likely do something with it, possible in assigning labels to variables. 

 

If you have other issues, it might help if you show the data step code generated by proc Import. It may be easier to modify that data step to read your data than to get proc import directly as you can add options such as TERMSTR and LRECL to handle issues like the end of line or line length as well as possible encoding issues.

 

A character variable has a 32767 character limit. What proc import does may well depend on the actual file, version and such. I would expect that most likely the field may get truncated. If you have a single field that long you would have to write a data step to capture the truncated data and explicit raw data would likely be needed as it is unlikely to be a trivial exercise.

 

 

 

 

 

Reeza
Super User
Is SAS running on a server here? Are you using Unix? EFI makes me think you're using EG or DI?
If a field has comments or open text that's usually problematic.
emaguin
Quartz | Level 8
My copy of SAS is running on my Win 7 machine. What is EG and DI?

Reeza
Super User

EG => Enterprise Guide which is a point and click interface primarily

DI => Data Integration Studio

 

There's also SAS Studio. 

 


@emaguin wrote:
My copy of SAS is running on my Win 7 machine. What is EG and DI?




Tom
Super User Tom
Super User

Please post the actual lines from the log if you want better advice.

 

A couple of ideas:

1) Your header line might be too long for PROC IMPORT.  It does not (at least did not, I haven't tested in awhile) handle header lines that are longer than 32K.  This can cause it to generated generic names for the later columns.

 

2) Perhaps you are bit by the stupidity of Excel on the MAC.  For some reason on a MAC Excel will by default export to CSV using carriage return only as the end of line marker.  So unless you tell SAS to use TERMSTR=CR on your INFILE (or FILENAME) statement then the file looks like one long line to SAS since it is expecting either CR+LF (Windows standard) or LF (Unix standard).  The programmers at Excel apparently never learned that MACOS switch to UNIX many years ago, so it no longer uses CR as the end of line marker. You need to make sure when exporting from Excel on MAC to choose an output format that generates proper end of line markers.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1368 views
  • 5 likes
  • 4 in conversation