BookmarkSubscribeRSS Feed
TomKari
Onyx | Level 15

My apologies, I muddied the waters in my earlier post, because I looked at the listing that had spaces removed, and therefore the columns didn't line up.

 

I don't have any explanations of the problem; in hopes of contributing, here are what I see as the pertinent details:

 

1. There appears to be a CRLF in the middle of a line around line 10524, which SAS isn't picking up correctly. Other than that, the data on the dump on the ERROR diagnostic looks fine.

 

2. On the NOTE SAS is saying it is line 10524, but _N_ is 10523. I'm guessing that's because of the Firstobs=2.

 

3. The "A character that could not be transcoded..." WARNING message.

 

4. SAS and Notepad++ are seeing 16813 records, but you feel there should be 16810 (number of CRLF plus 1).

 

My next step would be to run the following (untested)

 

Data _null_;
Infile "file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
File "other_file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
Input;
If _n_ = 1 | _n_ = 10524 then output;
Run;

 

and then try running your import program against the reduced file. If it still fails, EXTREMELY weird. If it works, evidence that something higher up is a problem.

 

Then futz around with this process, maybe changing the ENCODING options, etc, until something pokes you in the eye. My feeling is it's some corrupted characters in there somewhere.

 

Good luck,
Tom

Autotelic
Obsidian | Level 7
Thanks, I will only be able to try this in about 40 hours.
Actually I expect 16809 records, that's because the file ends with a blank line, so there are as many CRLFs as there are datalines.
The character that can't be transcoded is unrelated to this, IMO.
I actually asked about this here: https://communities.sas.com/t5/SAS-Enterprise-Guide/Importing-text-file-with-Unicode-U-0099/m-p/4099...
Autotelic
Obsidian | Level 7

This thread is getting long. Apologies for this unrelated post, but I want to thank everyone that has been trying to help: thank you very much!
I'm going to keep working on this issue.

Patrick
Opal | Level 21

@Autotelic

If it's possible then attach your actual data so we can investigate what's going on.

Have you already tried if your code works on a small subset of data? Using Notepad++ create a file with only a few records but make sure you keep the line where SAS throws the Error (plus the line right before and right after).

Autotelic
Obsidian | Level 7

 

That code doesn't seem to write anything. I experimented with different values of _n_ ans always got zero results. Any idea of what's going on here?

NOTE: 16809 records were read from the infile "C:\Users\10173847.GRUPOECI\Documents\My SAS Files\Performance\Inputs 
MicroStrategy\Performance_RT_20171102.csv".
The minimum record length was 260.
The maximum record length was 572.
NOTE: 0 records were written to the file "C:\Users\10173847.GRUPOECI\Documents\My SAS Files\Performance\test.csv".

 

Autotelic
Obsidian | Level 7
Sorry, @Kurt_Bremser
I'm not exactly sure of how this works, but since I quick-replying to a specific post, I assumed it would be identifiable somehow. The code in question is the one provided by TomKari in a post above.
Tom
Super User Tom
Super User

@Autotelic wrote:

No idea. I'm way out of my element here, trying to convey as much useful information as I can.
When I import the file manually, SAS is able to do it correctly, but it seems to preprocess the file first as the file in the infile statement is a .txt and it uses the enconding WLATIN1. If I try to import using this encoding I get an error and it says that 'A byte-order mark indicates that the data is encoded in "utf-16le"'.

How would I go about finding that 2-byte character that might be in the file? I can't find anything out of the ordinary when I open the file in Notepad++, nor on Notepad for that matter.

Plus, when I import the file in pandas (with the encoding UTF-16), it imports it fine as well.

I'm at a loss.


Wait, is this the same file with the transcoding issue because of the U+0099 character?

https://communities.sas.com/t5/SAS-Enterprise-Guide/Importing-text-file-with-Unicode-U-0099/m-p/4101...

 

If your SAS session is using ENCODING=WLATIN1 then SAS will definitely trancode the file into WLATIN1 before putting it into the input buffer for the INPUT statement to read.

 

If you want to read a UTF-16 source file then make sure that your SAS session is using ENCODING=UTF-8. The ENCODING is determined when SAS starts. 

 

It should be a lot easier to read Unicode data when running SAS with Unicode support.

Autotelic
Obsidian | Level 7
Alas, due to lack of permissions, I'm unable to change the encoding on the session following the steps provided in the link http://support.sas.com/kb/51/586.html
Is there any other way?
Tom
Super User Tom
Super User

@Autotelic wrote:
Alas, due to lack of permissions, I'm unable to change the encoding on the session following the steps provided in the link http://support.sas.com/kb/51/586.html
Is there any other way?

Do you have PC SAS installed? Or are you using EG to connect to a remote server?  If you are connecting to a local server then you should have PC SAS.

If your SAS was installed recently you should already have a command to launch SAS using Unicode support.

image.png

You could then use a PC SAS session to explore the file and either import it or fix it so that it can be used with WLATIN1 encoding.

If you do not see a way to start SAS with Unicode support then ask you local support staff to setup one.

 

If you are only using remote servers with EG then ask your team to setup an application server that is using unicode support.

Then run you EG project/program using that server.

Autotelic
Obsidian | Level 7
I was able to edit the sasv9.cfg file! I'm now on a utf-8 session, but the error persists regardless.
Tom
Super User Tom
Super User
Try the data _null_ step that output the offending line to a separate files again (use a different filename so you have both attempts) using your SAS Unicode session.
Then look at both little files using code like this to see the actual characters in the file.
data _null_ ; infile 'filename' recfm=f lrecl=100 encoding='any'; input; list; run;
If the files is really UTF-16 you should see the BOM mark and then two bytes per character.
See if there is any difference between the two versions.
Also look to see if you can find the CR and/or LF that is making the little file look like it has three lines.
TomKari
Onyx | Level 15

Apologies, brain fart!

 

I always try to test code before I post it, but I had to rush to do something else.

 

Tom

 

Data _null_;
Infile "file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
File "other_file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
Input;
If _n_ = 1 | _n_ = 10524 then put _infile_;
Run;
Autotelic
Obsidian | Level 7

Great! This worked and it gives us a clue..

 The code below output not two, but three lines. The header's row first, the problematic data line last and inbetween it also output the line just before the problematic line.

Data _null_;
Infile "file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
File "other_file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
Input;
If _n_ = 1 | _n_ = 10524 then put _infile_;
Run;

The importation of  "other_file_path_name" worked fine too.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 38 replies
  • 7204 views
  • 6 likes
  • 7 in conversation