BookmarkSubscribeRSS Feed
Nietzsche
Lapis Lazuli | Level 10

One of the data files storm_2017.sas7bdat that came with programming 2 has lots invalid characters causing errors. 

 

willtopower_0-1666419684706.png

 

what is the best way to treat these invalid characters? I am thinking about downloading it as CSV and removing them manually and then uploading up back into SAS.

 

SAS Base Programming (2022 Dec), Preparing for SAS Advanced Programming (Cancelled).
6 REPLIES 6
Nietzsche
Lapis Lazuli | Level 10

I am sorry I am still a beginner, I kinda understand what you're saying but I have no idea how to implement it in SAS.

SAS Base Programming (2022 Dec), Preparing for SAS Advanced Programming (Cancelled).
Ksharp
Super User
It looks like a carriage return character '\n' .
Can you change your sas 's encoding into 'utf-8' or others ? maybe could solve it .
Kurt_Bremser
Super User

This will take some research on your part. Use the $HEX format to display the hex codes of your strings, so you can use them for defining your "target" characters (see the function documentation).

From the context, you will be able to determine the wanted UTF character (in "Yucatan", it must be some kind of accented a). Look up the multi-byte "replacement" string in online available UTF tables.

 

Edit: fixed typos

Tom
Super User Tom
Super User

Does it really cause any trouble for doing the exercises?  If not then just ignore it.

 

To figure out what is going on do a little research.

 

Find out what encoding your SAS session is using.

proc options option=encoding;
run;

Find out what encoding the dataset is using.

proc contents data=mylib.storm_2017;
run;

There are two main types of encodings used. 

Single byte encodings such as LATIN1 or WLATIN1.  In those encoding each character uses only one byte.  But that means that there are only 256 possible characters (which includes invisible characters like TAB, CARRIAGE RETURN and LINE FEED).

And multibyte encoding, which will normally be UTF-8.  With UTF-8 some characters require 2,3 to 4 bytes to be represented.

So there literally thousands of characters that can be represented.

 

If your SAS session is using a single byte encoding then it might be that the characters in the dataset are ones that are not included in the 256 characters available in that encoding. So if your SAS session is NOT using UTF-8 as the encoding then see if you can change that.  That way SAS will be able to represent any character that might be in the dataset.  

 

Note:  If the encoding settings on the dataset are confused you could still have invalid strings.  For example if the strings in the dataset are actually using a single byte encoding, but SAS thinks they are UTF-8 then there are some combination of bytes that are not valid in UTF-8.  That can also be fixed, but might require more effort.

 

Nietzsche
Lapis Lazuli | Level 10

yeah, but I just completed the exercise with another sas7.dbat file

SAS Base Programming (2022 Dec), Preparing for SAS Advanced Programming (Cancelled).

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 515 views
  • 1 like
  • 4 in conversation