I have not used SAS in several years but am back needing to use it frequently. I am having trouble getting SAS to properly recognize and create numeric columns imported (PROC Import) from xlsx and csv files. Looking at old code, I did not do anything to help SAS recognize what columns were numeric vs. character. Even when SAS is doing a calculation and creating what should be a new numeric column in a dataset, it is assigning it as character. I know there are work arounds to get SAS to recognize and handle the information correctly, but I am curious if something has changed over the last several years that has made SAS less intuitive in the import process. I have a person helping me that is much more knowledgeable than I am with SAS and even he has been surprised at the difficulty I have had with getting SAS to recognize and create numeric data. The issue seems to be in the Proc Import statement and relates to not being supported with the V9 engine. My excel file is in xlsx format. I have a lot of missing data represented as . in my excel file. Any help would be appreciated.
Here is the first part of my code.
It would help if you showed us a screen capture of the results in SAS which are incorrect, and another screen capture of how the same data looks in Excel. Please use the "Insert Photos" icon to include your screen capture in your reply. Do NOT attach files.
In general, PROC IMPORT is a guessing procedure, and it is possible (this does happen sometimes) that it guesses wrong.
My datasets are very long in terms of columns and I cannot capture the entire set in a screenshot.
Are all of the columns beyond those first few key columns numeric?
If so then reading in a CSV file will be much easier. Something like below. Just make up nice SAS names for the various columns in your file. You could even try copying the first line form the CSV (or spreadsheet) and paste it into your INPUT statement. Then just change the commas to spaces and do any other polishing .
data want;
   infile 'myfile.csv' dsd truncover firstobs=2 ;
   input title :$100. pages :$30 author :$50. year 
      vara varb ............
   ;
run;
As Tom already said that's the normal challenge with floating point precision when interfacing with another "platform". Nothing SAS nor SAS interfacing with Excel specific but just how computers work.
Your column CT_value is likely still of type numeric (Proc Contents would tell you). The value 4.00000000001E-3 is likely just scientific display of a numerical value.
If you know what the max. significant decimals of your floating point numbers are then you could round them to something below to get rid of these very small (last digits) issues.
data work.mydata;
  var1=4.00000000001E-3;
  var2=0.0004000000001;
  var3='abc';
  format var1 var2 best32.17;
run;
proc print data=work.mydata;
run;
data work.mydata;
  set work.mydata;
  array numvars _numeric_;
  do over numvars;
    numvars=round(numvars,0.00000000001);
  end;
run;
proc print data=work.mydata;
run;
@Patrick wrote:
Your column CT_value is likely still of type numeric (Proc Contents would tell you). The value 4.00000000001E-3 is likely just scientific display of a numerical value.
I'm rather sure that it is character because the values appear to be left-aligned in the Viewtable and also "4.0000000000000001E-3" (15 zeros) is impossible as a value of a numeric variable in SAS (under Windows). The same holds for "7.0000000000000007E-2" in NT_value.
While 4.0000000000000001E-3 means 0.004 + 1E-19 mathematically, the least significant bit in the internal representation of 0.004 has a place value of 2**-60=8.67...E-19. So, those 1E-19 (and even 4E-19, still less than half that place value) are rounded off, which means that an import as a numeric value should result in the clean value 0.004 automatically.
It's a bit surprising, though, that the numeric literal 7.0000000000000007E-2 (mathematically: 0.07+7E-18) in SAS is equal to 0.07 (on my Windows workstation, regardless of the DECIMALCONV system option setting) and not rounded up, although 7E-18 is greater than half the place value 2**-56=1.38...E-17 of the least significant bit. However, SAS correctly rounds the sum 0.07+7E-18 up and the sum 0.07+6E-18 down.
Your example 4.00000000001E-3 (0.004+1E-14) and the number 4.00000000000001E-3 (0.004+1E-17) mentioned by the OP are far enough apart from 0.004 so that, as you rightly suggest, explicit rounding would be required to obtain 0.004.
CSV files and XLSX files are completely different things and will have different issues when being used by SAS (and by EXCEL for that matter).
A CSV file is a simple text file consisting of lines of text. Each line has multiple values delimited by a comma (you can change the delimiter but the concept is the same). There is NOTHING in a CSV that describes what types of values the text on the lines represent. The reader of the file has to decide how to interpret the text that appears.
An XLSX file is binary object that contains one or more worksheets. Each cell in a worksheet can contain anything.
A SAS dataset consists of observations and variables. The variables must contain the same type of date (numeric or character) on every observation. So mapping a column in a worksheet into a variable in a dataset will cause problems when a column consists of cells that contain different types of values. When that happens SAS will be forced to define the variable is character (you can represent a number as a character string but you cannot represent every character string as a number).
The other issue that PROC IMPORT will have with XSLX files is with empty columns. If every cell in the column is empty then SAS will create a character variable of length $1. (It takes less space to store and a numeric variable).
For a CSV file you can have complete control over how SAS reads the file by just writing your own data step. But for an XSLX file you cannot.
@DanFleming wrote:
I prefer to stick with xlsx files but converted to csv to see if that fixed the issues. It did not. One thing I am noticing is that a value in excel of 0.004 is being converted to 4.00000000000001E-3. Most of my numeric columns have an occurrence of that. Does that help to better understand the issue?
That is just normal floating point precision issues. You cannot represent the decimal value 0.004 exactly using floating point binary numbers. Excel and SAS might not use the exact same floating point representations, hence you can see these types of trivial differences.
You can expect to find that type of issue with any work using fractional values.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
