BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mona4u
Lapis Lazuli | Level 10

Hi All, 

I recently faced an issue with SAS and I am not sure about the cause of the issue. 

I ran the same inputs twice and SAS strangely created extra symbols instead of the comma in the second time. Even though I didn't change anything in my code.  

I have attached both outputs.  Please let me know if there is a soln for this problem. 

 

 

 

Capture_SAS.PNGCapture_SAS1.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So the text file you posted is using UTF-8 encoding.  You can tell because curly quote in

Barrett’s Esophagus

is stored as the three byte sequence 'E28099'x instead of the single byte '92'x that it would require in WLATIN1 encoding.

 

Add this option to your INFILE statement.

encoding='utf-8'

 

If the data has many other UTF-8 characters then you might need to run your SAS session using UTF-8 encoding instead of running it with WLATIN1 (or whatever single byte encoding you are using) since you might encounter Unicode strings that could not be mapped to single byte codes in the encoding that your SAS session is using.

 

View solution in original post

5 REPLIES 5
Kurt_Bremser
Super User

Please post:

  • the input data, either as attachment or in a {i} sub window
  • your code ("little running man" button)
  • the logs from both executions (once again, {i} button)
mona4u
Lapis Lazuli | Level 10

Hi, 

I have attached all files input-output and the program. 

I don't know how to save the log. 

 

Thanks. 

Tom
Super User Tom
Super User

So the text file you posted is using UTF-8 encoding.  You can tell because curly quote in

Barrett’s Esophagus

is stored as the three byte sequence 'E28099'x instead of the single byte '92'x that it would require in WLATIN1 encoding.

 

Add this option to your INFILE statement.

encoding='utf-8'

 

If the data has many other UTF-8 characters then you might need to run your SAS session using UTF-8 encoding instead of running it with WLATIN1 (or whatever single byte encoding you are using) since you might encounter Unicode strings that could not be mapped to single byte codes in the encoding that your SAS session is using.

 

Reeza
Super User
This means your input file did not properly quote embedded commas in your text or you have both commas and quotes in your comments messing it up. In these types of cases cleaning the data can be painful. I'll often just import it from an Excel file, if the Excel file is formatted correctly.
Tom
Super User Tom
Super User

Those are not commas.  A comma sits on the baseline and has a tail that extends below the baseline. It is is period with a tail.

Those look like single quotes in your first example.  In the second one perhaps somehow the text went through something that decided it should replace the simple single quotes with something "prettier".

 

Check the ENCODING settings of the two SAS sessions.  And if you are actually reading the data from text files then check the encoding (if any) that is stored in BOM (https://en.wikipedia.org/wiki/Byte_order_mark) of the files.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1680 views
  • 1 like
  • 4 in conversation