BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tom
Super User Tom
Super User

Post an minimal reproducible example file and we can look at it.

Ksharp
Super User

Sure. Here is . I modify it at line 7 and 8 as showed in my picture above.

My code could get right result.

 

 

 

Tom
Super User Tom
Super User

How is a file with almost 30,000 lines MINIMAL?

NOTE: 29960 records were read from the infile 'c:\downloads\have.csv'.
The minimum record length was 0.
The maximum record length was 6546.

The issue has nothing to do with UTF-8 support.

That file has a linefeed in the middle of the unquoted word english.

 

Ksharp
Super User

@Tom 

“The issue has nothing to do with UTF-8 support.”

 

I mean SAS LOG didn't show any ERROR info, therefore I am supposed to import correctly. However, I get wrong/screw result. as the following pictures showed:

 

Ksharp_0-1663156219125.pngKsharp_1-1663156233480.png

 

Tom
Super User Tom
Super User

If the CSV file is not validly constructed then any attempt to read it is going to have problems.

 

The macro you are asking about is designed to remove linefeeds or carriage returns that are inside of quoted strings.  The file you provided has (at least) an extra linefeed inside of a string that is not quoted.  The word english that you circled in red in one of your photographs.

 

So if you want to fix that file you will need to use a different algorithm. Or just fix it manually using a text editor.

Ksharp
Super User
OK.I know. But my code could get job done.
My code is depend on the pattern of starting a new row, that also not be able to guaranteed to import it correctly one hundred percent.
Tom
Super User Tom
Super User

@Ksharp wrote:
OK.I know. But my code could get job done.
My code is depend on the pattern of starting a new row, that also not be able to guaranteed to import it correctly one hundred percent.

Checking that the start of a line follows a pattern is a good method to attempt to located unexpected line breaks.

As you mention it might not be 100% accurate if the extra line break happens to make a line that looks like the start of a valid new record.

It is probably worth adding a step to re-confirm how many fields are on each line before attempting to read the data.  When the field counts are off you could then resort to manual corrections.

DaveShea
Lapis Lazuli | Level 10
Whoever wrote that %Replace_CRLF macro (via the link to GitHub) has produced a very simple, yet elegant solution to the problem of LF characters embedded in free-text fields. It looks like it is easily extensible as well. Nice work, thanks.

DownunderDave
Wellington

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 8108 views
  • 8 likes
  • 6 in conversation