One of the data files storm_2017.sas7bdat that came with programming 2 has lots invalid characters causing errors.
what is the best way to treat these invalid characters? I am thinking about downloading it as CSV and removing them manually and then uploading up back into SAS.
Make a dictionary of characters and their UTF replacements (e.g. in an array), and then use TRANSTRN to do each replacement.
I am sorry I am still a beginner, I kinda understand what you're saying but I have no idea how to implement it in SAS.
This will take some research on your part. Use the $HEX format to display the hex codes of your strings, so you can use them for defining your "target" characters (see the function documentation).
From the context, you will be able to determine the wanted UTF character (in "Yucatan", it must be some kind of accented a). Look up the multi-byte "replacement" string in online available UTF tables.
Edit: fixed typos
Does it really cause any trouble for doing the exercises? If not then just ignore it.
To figure out what is going on do a little research.
Find out what encoding your SAS session is using.
proc options option=encoding;
run;
Find out what encoding the dataset is using.
proc contents data=mylib.storm_2017;
run;
There are two main types of encodings used.
Single byte encodings such as LATIN1 or WLATIN1. In those encoding each character uses only one byte. But that means that there are only 256 possible characters (which includes invisible characters like TAB, CARRIAGE RETURN and LINE FEED).
And multibyte encoding, which will normally be UTF-8. With UTF-8 some characters require 2,3 to 4 bytes to be represented.
So there literally thousands of characters that can be represented.
If your SAS session is using a single byte encoding then it might be that the characters in the dataset are ones that are not included in the 256 characters available in that encoding. So if your SAS session is NOT using UTF-8 as the encoding then see if you can change that. That way SAS will be able to represent any character that might be in the dataset.
Note: If the encoding settings on the dataset are confused you could still have invalid strings. For example if the strings in the dataset are actually using a single byte encoding, but SAS thinks they are UTF-8 then there are some combination of bytes that are not valid in UTF-8. That can also be fixed, but might require more effort.
yeah, but I just completed the exercise with another sas7.dbat file
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.