SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
geneshackman
Pyrite | Level 9

I have a data set, sas can't seem to read in the equal to, greater than symbol, ≥400%

 

I use this code

PROC IMPORT DATAFILE= "\\myfolder\pa66.xlsx"
out=pa66_working dbms=xlsx replace;
sheet='2016-2017';
RUN;

 

I get the following

 

NOTE: Variable Name Change. Data Source -> Data_Source
NOTE: Variable Name Change. Stratifier Subgroup -> Stratifier_Subgroup
NOTE: Variable Name Change. Standard Error -> Standard_Error
NOTE: Variable Name Change. Intepret With Caution -> Intepret_With_Caution
NOTE: Variable Name Change. Data Not Reportable -> Data_Not_Reportable
NOTE: One or more variables were converted because the data type is not supported by the V9
engine. For more details, run with options MSGLEVEL=I.
WARNING: Some character data was lost during transcoding in column: Stratifier Subgroup at obs 20.
NOTE: The import data set has 40 observations and 14 variables.
NOTE: WORK.PA66_WORKING data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds

 

I'm attaching the data set. So far, I've been manually replacing ≥400% with 400+% but I just wondered whether there was any way for SAS to do this?

 

Thanks

 

13 REPLIES 13
Shmuel
Garnet | Level 18

I have looked in your input .xlsx file.

Sas replaces spaces by underscore to create a valid sas variable names.

In some cases sas cannot use input column label to create a valid sas variable name and needs to convert them. Other columns has variable length strings which may cause truncation.

 

I suggest to copy the generated data step by proc import, adapt variable names to be valid  and preferred and adapt char type variables length to max expected length.

Thus you will have a stabile program to read such file and easier to maintain in case of future changes.

 

geneshackman
Pyrite | Level 9

Thanks, but it's not the variable names with spaces that I'm asking about. It's the character in front of the 400%, the symbol for equal to or greater than.

Shmuel
Garnet | Level 18

@geneshackman wrote:

Thanks, but it's not the variable names with spaces that I'm asking about. It's the character in front of the 400%, the symbol for equal to or greater than.


What do you get instead the greater-equal symbol? Run next code and post the log:

proc options option=encoding; run;

beyond, try by next change to your code:

          out=pa66_working(encoding='UTF-8') dbms=xlsx replace;

geneshackman
Pyrite | Level 9
log output:
21 PROC IMPORT DATAFILE=
21 ! "\\dohfile02\phig\PHIGDATA\PreventionAgenda\Tableau\Data\&folderyear\pa66.xlsx"
22 out=pa66_working(encoding='UTF-8') dbms=xlsx replace;
23 sheet='2016-2017';
24 RUN;

NOTE: Variable Name Change. Data Source -> Data_Source
NOTE: Variable Name Change. Stratifier Subgroup -> Stratifier_Subgroup
NOTE: Variable Name Change. Standard Error -> Standard_Error
NOTE: Variable Name Change. Intepret With Caution -> Intepret_With_Caution
NOTE: Variable Name Change. Data Not Reportable -> Data_Not_Reportable
NOTE: Data file WORK.PA66_WORKING.DATA is in a format that is native to another host, or the file
encoding does not match the session encoding. Cross Environment Data Access will be used,
which might require additional CPU resources and might reduce performance.
NOTE: One or more variables were converted because the data type is not supported by the V9
engine. For more details, run with options MSGLEVEL=I.
WARNING: Some character data was lost during transcoding in column: Stratifier Subgroup at obs 20.
NOTE: The import data set has 40 observations and 14 variables.
NOTE: WORK.PA66_WORKING data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds


NOTE: Data file WORK.PA66_WORKING.DATA is in a format that is native to another host, or the file
encoding does not match the session encoding. Cross Environment Data Access will be used,
which might require additional CPU resources and might reduce performance.

geneshackman
Pyrite | Level 9
I don't know how it will show up, but this seems to be what is in the out data set
400%
some kind of box in front of the 400%
Tom
Super User Tom
Super User

@geneshackman wrote:
I don't know how it will show up, but this seems to be what is in the out data set
400%
some kind of box in front of the 400%

Once you have started your SAS session with an encoding that does not contain any code that means that character there is no way to fix it.  You could check what character it did get mapped to by using the $HEX format to look at it and then just change it.

data want ;
  set pa66_working;
   Stratifier_Subgroup = tranwrd(Stratifier_Subgroup,'1A'x,'=>');
run;

But that is not guaranteed to only change that character.  Any unknown character in the source will be mapped to same strange code.

 

Might be better to look for the rest of the string and then fix it.

data want ;
  set pa66_working;
  if index(Stratifier_Subgroup,'400%') then Stratifier_Subgroup = '>=400%';
run;

 

geneshackman
Pyrite | Level 9
Thanks very much!
1) Can I specify some other encoding that might include that character?
2) "look for the rest of the string and then fix it." That seems like the easiest solution.
Sajid01
Meteorite | Level 14

I have run your code on SAS University edition.

Prima facie I don't see any issue with your code and the result.

SAS does import  your spreadsheet  correctly including ≥400% 
What exactly is your issue?

 
 

 

Shmuel
Garnet | Level 18

@Sajid01 your answer relate to @geneshackman 's code.

SAS University edition is running with encoding = 'UTF-8' therefore you got the right value without any issue.

 

 

geneshackman
Pyrite | Level 9
Thanks. I don't know what utf-8 means. Can you point to a url that explains how this relates to the equal to or greater than symbol I was asking about?
Tom
Super User Tom
Super User

@geneshackman wrote:
Thanks. I don't know what utf-8 means. Can you point to a url that explains how this relates to the equal to or greater than symbol I was asking about?

Encoding is the how you are representing characters as bytes in your character strings.  The original ASCII standard for characters only use 7-bit so there are only 128 possible codes.  So there is no equal to or greater than character in the ASCII standard set of characters.  So it matters how your SAS session is encoding the data.  There are many single byte encodings that support the full 8 bits and so extend the 128 possible codes to 256 possible code, but they don't agree on what characters those extra codes represent.  Sounds like the one you are using also does not have a spot for that character.  UTF-8 can use multiple bytes to represent a single character.  So it will definitely let you represent equal to or greater than as a single character, but it will take more than one byte to store it in the variable.

 

Here is an older article that has a pretty detailed explanation.

https://blogs.sas.com/content/sgf/2017/05/19/demystifying-and-resolving-common-transcoding-problems/

 

It mainly talks about how it looks if you are launching SAS Display Manager on your PC.  If you are using SAS Enterprise Guide or SAS/Studio to run SAS on a remote server then you need to get your SAS administrator to setup a SAS application server that uses Unicode support and then connect to that application server instead of the server you are currently using.

Kurt_Bremser
Super User

UTF stands for Unicode Transfer Format.

Unicode is a character table orders of magnitude larger than the common ASCII table (which uses only 1 byte per character).

UTF-8 is the most commonly used method to encode this character table, but since several characters in the upper range of the ASCII table are used as starting characters for multi-byte sequences, the encoding of the software must match that of the data.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 5190 views
  • 8 likes
  • 5 in conversation