Re: Error importing line in CSV (Invalid data for *variable* in line *... - Page 2

TomKari · Posted 11-04-2017 05:23 PM

My apologies, I muddied the waters in my earlier post, because I looked at the listing that had spaces removed, and therefore the columns didn't line up.

I don't have any explanations of the problem; in hopes of contributing, here are what I see as the pertinent details:

1. There appears to be a CRLF in the middle of a line around line 10524, which SAS isn't picking up correctly. Other than that, the data on the dump on the ERROR diagnostic looks fine.

2. On the NOTE SAS is saying it is line 10524, but _N_ is 10523. I'm guessing that's because of the Firstobs=2.

3. The "A character that could not be transcoded..." WARNING message.

4. SAS and Notepad++ are seeing 16813 records, but you feel there should be 16810 (number of CRLF plus 1).

My next step would be to run the following (untested)

Data _null_;
Infile "file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
File "other_file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
Input;
If _n_ = 1 | _n_ = 10524 then output;
Run;

and then try running your import program against the reduced file. If it still fails, EXTREMELY weird. If it works, evidence that something higher up is a problem.

Then futz around with this process, maybe changing the ENCODING options, etc, until something pokes you in the eye. My feeling is it's some corrupted characters in there somewhere.

Good luck,
Tom

Autotelic · Posted 11-04-2017 06:19 PM

Thanks, I will only be able to try this in about 40 hours.
Actually I expect 16809 records, that's because the file ends with a blank line, so there are as many CRLFs as there are datalines.
The character that can't be transcoded is unrelated to this, IMO.
I actually asked about this here: https://communities.sas.com/t5/SAS-Enterprise-Guide/Importing-text-file-with-Unicode-U-0099/m-p/4099...

Autotelic · Posted 11-04-2017 06:21 PM

This thread is getting long. Apologies for this unrelated post, but I want to thank everyone that has been trying to help: thank you very much!
I'm going to keep working on this issue.

Patrick · Posted 11-04-2017 06:33 PM

@Autotelic

If it's possible then attach your actual data so we can investigate what's going on.

Have you already tried if your code works on a small subset of data? Using Notepad++ create a file with only a few records but make sure you keep the line where SAS throws the Error (plus the line right before and right after).

Autotelic · Posted 11-06-2017 07:32 AM

That code doesn't seem to write anything. I experimented with different values of _n_ ans always got zero results. Any idea of what's going on here?

NOTE: 16809 records were read from the infile "C:\Users\10173847.GRUPOECI\Documents\My SAS Files\Performance\Inputs 
MicroStrategy\Performance_RT_20171102.csv".
The minimum record length was 260.
The maximum record length was 572.
NOTE: 0 records were written to the file "C:\Users\10173847.GRUPOECI\Documents\My SAS Files\Performance\test.csv".

Kurt_Bremser · Posted 11-06-2017 08:19 AM

We are all "fishing in the dark" here.

Please post "that code" as you ran it, and at least the beginning of your infile as an attachment.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Autotelic · Posted 11-06-2017 10:04 AM

Sorry, @Kurt_Bremser
I'm not exactly sure of how this works, but since I quick-replying to a specific post, I assumed it would be identifiable somehow. The code in question is the one provided by TomKari in a post above.

Tom · Posted 11-04-2017 07:48 PM

@Autotelic wrote:

No idea. I'm way out of my element here, trying to convey as much useful information as I can.
When I import the file manually, SAS is able to do it correctly, but it seems to preprocess the file first as the file in the infile statement is a .txt and it uses the enconding WLATIN1. If I try to import using this encoding I get an error and it says that 'A byte-order mark indicates that the data is encoded in "utf-16le"'.

How would I go about finding that 2-byte character that might be in the file? I can't find anything out of the ordinary when I open the file in Notepad++, nor on Notepad for that matter.

Plus, when I import the file in pandas (with the encoding UTF-16), it imports it fine as well.

I'm at a loss.

Wait, is this the same file with the transcoding issue because of the U+0099 character?

https://communities.sas.com/t5/SAS-Enterprise-Guide/Importing-text-file-with-Unicode-U-0099/m-p/4101...

If your SAS session is using ENCODING=WLATIN1 then SAS will definitely trancode the file into WLATIN1 before putting it into the input buffer for the INPUT statement to read.

If you want to read a UTF-16 source file then make sure that your SAS session is using ENCODING=UTF-8. The ENCODING is determined when SAS starts.

It should be a lot easier to read Unicode data when running SAS with Unicode support.

Autotelic · Posted 11-06-2017 10:02 AM

Alas, due to lack of permissions, I'm unable to change the encoding on the session following the steps provided in the link http://support.sas.com/kb/51/586.html
Is there any other way?

Tom · Posted 11-06-2017 10:58 AM

@Autotelic wrote:
Alas, due to lack of permissions, I'm unable to change the encoding on the session following the steps provided in the link http://support.sas.com/kb/51/586.html
Is there any other way?

Do you have PC SAS installed? Or are you using EG to connect to a remote server? If you are connecting to a local server then you should have PC SAS.

If your SAS was installed recently you should already have a command to launch SAS using Unicode support.

You could then use a PC SAS session to explore the file and either import it or fix it so that it can be used with WLATIN1 encoding.

If you do not see a way to start SAS with Unicode support then ask you local support staff to setup one.

If you are only using remote servers with EG then ask your team to setup an application server that is using unicode support.

Then run you EG project/program using that server.

Autotelic · Posted 11-06-2017 12:40 PM

I was able to edit the sasv9.cfg file! I'm now on a utf-8 session, but the error persists regardless.

Tom · Posted 11-06-2017 12:52 PM

Try the data _null_ step that output the offending line to a separate files again (use a different filename so you have both attempts) using your SAS Unicode session.
Then look at both little files using code like this to see the actual characters in the file.
data _null_ ; infile 'filename' recfm=f lrecl=100 encoding='any'; input; list; run;
If the files is really UTF-16 you should see the BOM mark and then two bytes per character.
See if there is any difference between the two versions.
Also look to see if you can find the CR and/or LF that is making the little file look like it has three lines.

Kurt_Bremser · Posted 11-05-2017 02:40 AM

The BOM is right at the start of the file, and UTF-enabled programs react to it, but won't display it.

If you switch to hex display in notepad++, you should see it.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

TomKari · Posted 11-06-2017 08:23 AM

Apologies, brain fart!

I always try to test code before I post it, but I had to rush to do something else.

Tom

Data _null_;
Infile "file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
File "other_file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
Input;
If _n_ = 1 | _n_ = 10524 then put _infile_;
Run;

Autotelic · Posted 11-06-2017 09:49 AM

Great! This worked and it gives us a clue..

The code below output not two, but three lines. The header's row first, the problematic data line last and inbetween it also output the line just before the problematic line.

Data _null_;
Infile "file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
File "other_file_path_name" LRECL=32767 ENCODING="UTF-16" TERMSTR=CRLF;
Input;
If _n_ = 1 | _n_ = 10524 then put _infile_;
Run;

The importation of "other_file_path_name" worked fine too.

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Re: Error importing line in CSV (Invalid data for variable in line line column_range)

Registration is open

SAS Training: Just a Click Away