My apologies, I muddied the waters in my earlier post, because I looked at the listing that had spaces removed, and therefore the columns didn't line up.
I don't have any explanations of the problem; in hopes of contributing, here are what I see as the pertinent details:
1. There appears to be a CRLF in the middle of a line around line 10524, which SAS isn't picking up correctly. Other than that, the data on the dump on the ERROR diagnostic looks fine.
2. On the NOTE SAS is saying it is line 10524, but _N_ is 10523. I'm guessing that's because of the Firstobs=2.
3. The "A character that could not be transcoded..." WARNING message.
4. SAS and Notepad++ are seeing 16813 records, but you feel there should be 16810 (number of CRLF plus 1).
My next step would be to run the following (untested)
Data _null_;
Infile "file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
File "other_file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
Input;
If _n_ = 1 | _n_ = 10524 then output;
Run;
and then try running your import program against the reduced file. If it still fails, EXTREMELY weird. If it works, evidence that something higher up is a problem.
Then futz around with this process, maybe changing the ENCODING options, etc, until something pokes you in the eye. My feeling is it's some corrupted characters in there somewhere.
Good luck,
Tom
This thread is getting long. Apologies for this unrelated post, but I want to thank everyone that has been trying to help: thank you very much!
I'm going to keep working on this issue.
If it's possible then attach your actual data so we can investigate what's going on.
Have you already tried if your code works on a small subset of data? Using Notepad++ create a file with only a few records but make sure you keep the line where SAS throws the Error (plus the line right before and right after).
That code doesn't seem to write anything. I experimented with different values of _n_ ans always got zero results. Any idea of what's going on here?
NOTE: 16809 records were read from the infile "C:\Users\10173847.GRUPOECI\Documents\My SAS Files\Performance\Inputs MicroStrategy\Performance_RT_20171102.csv". The minimum record length was 260. The maximum record length was 572. NOTE: 0 records were written to the file "C:\Users\10173847.GRUPOECI\Documents\My SAS Files\Performance\test.csv".
We are all "fishing in the dark" here.
Please post "that code" as you ran it, and at least the beginning of your infile as an attachment.
@Autotelic wrote:
No idea. I'm way out of my element here, trying to convey as much useful information as I can.
When I import the file manually, SAS is able to do it correctly, but it seems to preprocess the file first as the file in the infile statement is a .txt and it uses the enconding WLATIN1. If I try to import using this encoding I get an error and it says that 'A byte-order mark indicates that the data is encoded in "utf-16le"'.
How would I go about finding that 2-byte character that might be in the file? I can't find anything out of the ordinary when I open the file in Notepad++, nor on Notepad for that matter.
Plus, when I import the file in pandas (with the encoding UTF-16), it imports it fine as well.
I'm at a loss.
Wait, is this the same file with the transcoding issue because of the U+0099 character?
If your SAS session is using ENCODING=WLATIN1 then SAS will definitely trancode the file into WLATIN1 before putting it into the input buffer for the INPUT statement to read.
If you want to read a UTF-16 source file then make sure that your SAS session is using ENCODING=UTF-8. The ENCODING is determined when SAS starts.
It should be a lot easier to read Unicode data when running SAS with Unicode support.
@Autotelic wrote:
Alas, due to lack of permissions, I'm unable to change the encoding on the session following the steps provided in the link http://support.sas.com/kb/51/586.html
Is there any other way?
Do you have PC SAS installed? Or are you using EG to connect to a remote server? If you are connecting to a local server then you should have PC SAS.
If your SAS was installed recently you should already have a command to launch SAS using Unicode support.
You could then use a PC SAS session to explore the file and either import it or fix it so that it can be used with WLATIN1 encoding.
If you do not see a way to start SAS with Unicode support then ask you local support staff to setup one.
If you are only using remote servers with EG then ask your team to setup an application server that is using unicode support.
Then run you EG project/program using that server.
The BOM is right at the start of the file, and UTF-enabled programs react to it, but won't display it.
If you switch to hex display in notepad++, you should see it.
Apologies, brain fart!
I always try to test code before I post it, but I had to rush to do something else.
Tom
Data _null_;
Infile "file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
File "other_file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
Input;
If _n_ = 1 | _n_ = 10524 then put _infile_;
Run;
Great! This worked and it gives us a clue..
The code below output not two, but three lines. The header's row first, the problematic data line last and inbetween it also output the line just before the problematic line.
Data _null_; Infile "file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF; File "other_file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF; Input; If _n_ = 1 | _n_ = 10524 then put _infile_; Run;
The importation of "other_file_path_name" worked fine too.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.