Hello,
Maybe someone could explain me the proc import comportement in encoding the values:
Exemple: I have the value 6.66% in my excel file,
proc import translate this value in character value 6.66666
When I'm forcing by adding the 0 to this values I have the correct values in table: 6.66+0 =0.066.
I can explain the transformation: 6.66666 (character) +0 =0.06666 (numeric)
Thank you
Please post a small example that demonstrates the issue. Do you have an XLSX file or the older XLS format?
Note that Excel is a spreadsheet where every cell can have a different type of value, but SAS datasets require that each variable has only one type for all observations. Make sure that there are no character strings in other cells in the column that has the 6.6% value. When the column has mixed data then SAS must create a character variable. It is possible to represent numbers using character strings, but trying to represent strings as numbers is not really possible.
proc import
datafile= "xxx\Exemple.xlsx"
out=Retours
replace dbms=xlsx;
getnames=yes;
run;
data Retours1;
set Retours;
Exemple2=(Exemple+0);
Exemple3=compress(put(round(Exemple2*100,0.01),10.2)||'%');
if Exemple2^=. then Exemple=Exemple3;
run;
Please attach an example of such an Excel file, and post your import code, so we can recreate your problem.
Thank you!
proc import
datafile= "xxxx\Exemple.xlsx"
out=Retours
replace dbms=xlsx;
getnames=yes;
run;
data Retours1;
set Retours;
Exemple2=(Exemple+0);
Exemple3=compress(put(round(Exemple2*100,0.01),10.2)||'%');
if Exemple2^=. then Exemple=Exemple3;
run;
I used the usual "get rid of Excel woes" method:
I saved the file from Excel to a csv and copied it unto our server, then ran this code:
data test;
infile '$HOME/sascommunity/Exemple.csv' truncover firstobs=2;
input stringval $20.;
stringval = translate(stringval,'.',',');
if stringval not in ('','NC*') then numval = input(stringval,percent20.);
format numval nlpct7.2;
run;
proc print data=test noobs;
run;
Partial result:
stringval numval 6.66% 6,66% 0.67% 0,67% -6.00% -6,00% 7.2311% 7,23% 8.80% 8,80% . 2.60% 2,60% 1.20% 1,20% NC* . 0.80% 0,80% NC* . NC* . 0.34% 0,34% 0.97% 0,97% 0.80% 0,80% 0.80% 0,80% NC* . 0.65% 0,65% NC* . NC* . 0.64% 0,64% NC* . NC* . NC* . NC* . 1.00% 1,00% NC* . 0.72% 0,72% 1.00% 1,00% 0.52% 0,52% 0.83% 0,83% 0.52% 0,52% 0.98% 0,98% NC* . NC* . NC* . NC* . 0.65% 0,65% 0.04% 0,04% NC* . NC* . 0.50% 0,50% NC* . 0.56% 0,56% NC* . 0.69% 0,69% 0.60% 0,60% 0.60% 0,60% 0.60% 0,60% 0.75% 0,75% 0.75% 0,75% NC* . 1.15% 1,15% NC* . NC* .
Bottom line: Excel files are useless for data transfer, textuality rules, as it allows you to easily make corrections.
Go back to your Excel file. Examine VERY closely each value in the first 20 rows for the column that is giving you problems. By default SAS uses the values of the first 20 rows to set the properties of variables. Values like NA, NULL or anything not numeric in those rows means SAS usually assign a character property and then use the values to set how long.
Just because it looks numeric does not mean the cell has an actual numeric value, especially if the data was manually entered. The underlying issue is that Excel, and most spreadsheets, to not have an actual concept of datatype for a column. The same column can contain text, numeric values, dates, times. I won't even go into people that use cell colors to indicate actual values. So processes have to make some choice what to do with the column. With only 20 rows examined then the guessing algorithm may be working with a very incomplete set of data.
The better way to convert character to numeric is to use the Input function: newvar = input(charvar,informat.); That will prevent getting messages in the Log like "character values have been converted to numeric at" . Which is sometimes an indication of not quite right code, especially if you did not expect a variable to be character.
"By default SAS uses the values of the first 20 rows to set the properties of variables."
I believe with xlsX all rows get analyzed. I've just run a test using an Excel sheet where in column A the first 10000 rows had a value of 1 (numeric) and row 10001 a value of 'a'. The resulting SAS variable became character with a length of 1.
Using Proc Import with SAS9.4M5 under Rhel I can replicate what you describe. Here what I'm getting as a result (in a character column)
This looks like SAS reads at least for some of the cells the "raw" values from the sheet1.xml which doesn't appear to be the correct behaviour (even though your Excel data is especially ugly).
<row r="2" spans="1:1" x14ac:dyDescent="0.25"> <c r="A2" s="3"> <v>6.6600000000000006E-2</v> </c> </row> <row r="3" spans="1:1" x14ac:dyDescent="0.25"> <c r="A3" s="3"> <v>6.6600000000000001E-3</v> </c> </row> <row r="4" spans="1:1" x14ac:dyDescent="0.25"> <c r="A4" s="3"> <v>-0.06</v> </c> </row> <row r="5" spans="1:1" x14ac:dyDescent="0.25"> <c r="A5" s="4"> <v>7.2311110999999997E-2</v> </c> </row> <row r="6" spans="1:1" x14ac:dyDescent="0.25"> <c r="A6" s="3" t="s"> <v>27</v> </c> </row>
If I use the SAS EG8.2 import wizard then the result looks better. Still strings of course but at least wysiwig and something you can post-process without having to know how Excel stores values internally.
What Proc Import does to your Excel data is may-be something you should raise with SAS Technical Support.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.