BookmarkSubscribeRSS Feed
DanFleming
Calcite | Level 5

I have not used SAS in several years but am back needing to use it frequently. I am having trouble getting SAS to properly recognize and create numeric columns imported (PROC Import) from xlsx and csv files. Looking at old code, I did not do anything to help SAS recognize what columns were numeric vs. character. Even when SAS is doing a calculation and creating what should be a new numeric column in a dataset, it is assigning it as character. I know there are work arounds to get SAS to recognize and handle the information correctly, but I am curious if something has changed over the last several years that has made SAS less intuitive in the import process. I have a person helping me that is much more knowledgeable than I am with SAS and even he has been surprised at the difficulty I have had with getting SAS to recognize and create numeric data. The issue seems to be in the Proc Import statement and relates to not being supported with the V9 engine. My excel file is in xlsx format. I have a lot of missing data represented as . in my excel file. Any help would be appreciated. 

Here is the first part of my code.

dm "output" clear;
dm "log" clear;
Proc Import datafile = "J:\WRRI\Everyone\Transport MetaAnalysis\Pesticide transport meta-analysis\Pesticide transport meta-analysis.xlsx"
                        out=mydata
                      DBMS=xlsx replace;
Sheet='Master';
run;
data a; set mydata;
keep Latitude Longitude 
Sand Silt Clay OM OC pH Soil_type Crop_rotation Crop Rainfall_type Cover_crop  
Pest_incorporated KOW logP Water_solubility 
Year_in_trial 
Type_resp_var Response_variable Measure 
Year_weights Analysis
CT_NT_resp_ratio CT_ConsT_resp_ratio ConsT_NT_resp_ratio;
if cover_crop = "Yes" or Pest_incorporated = "Yes" or Analysis = "SEdiff" or Year_in_trial = "." or response_variable ne "Water pesticide loss" then delete;
run;
proc sort data=a;
    by response_variable year_in_trial KOW OC;
8 REPLIES 8
PaigeMiller
Diamond | Level 26

It would help if you showed us a screen capture of the results in SAS which are incorrect, and another screen capture of how the same data looks in Excel. Please use the "Insert Photos" icon to include your screen capture in your reply. Do NOT attach files.

 

In general, PROC IMPORT is a guessing procedure, and it is possible (this does happen sometimes) that it guesses wrong. 

--
Paige Miller
DanFleming
Calcite | Level 5

My datasets are very long in terms of columns and I cannot capture the entire set in a screenshot.Screenshot.pngScreenshot 2.png

Tom
Super User Tom
Super User

Are all of the columns beyond those first few key columns numeric?

If so then reading in a CSV file will be much easier.  Something like below.  Just make up nice SAS names for the various columns in your file.  You could even try copying the first line form the CSV (or spreadsheet) and paste it into your INPUT statement.  Then just change the commas to spaces and do any other polishing .

data want;
   infile 'myfile.csv' dsd truncover firstobs=2 ;
   input title :$100. pages :$30 author :$50. year 
      vara varb ............
   ;
run;

 

Patrick
Opal | Level 21

As Tom already said that's the normal challenge with floating point precision when interfacing with another "platform". Nothing SAS nor SAS interfacing with Excel specific but just how computers work.

 

Your column CT_value is likely still of type numeric (Proc Contents would tell you). The value 4.00000000001E-3 is likely just scientific display of a numerical value. 

 

If you know what the max. significant decimals of your floating point numbers are then you could round them to something below to get rid of these very small (last digits) issues.

data work.mydata;
  var1=4.00000000001E-3;
  var2=0.0004000000001;
  var3='abc';
  format var1 var2 best32.17;
run;
proc print data=work.mydata;
run;

data work.mydata;
  set work.mydata;
  array numvars _numeric_;
  do over numvars;
    numvars=round(numvars,0.00000000001);
  end;
run;
proc print data=work.mydata;
run;

Patrick_0-1714529740609.png

 

FreelanceReinh
Jade | Level 19

@Patrick wrote:

Your column CT_value is likely still of type numeric (Proc Contents would tell you). The value 4.00000000001E-3 is likely just scientific display of a numerical value.


I'm rather sure that it is character because the values appear to be left-aligned in the Viewtable and also "4.0000000000000001E-3" (15 zeros) is impossible as a value of a numeric variable in SAS (under Windows). The same holds for "7.0000000000000007E-2" in NT_value.

 

While 4.0000000000000001E-3 means 0.004 + 1E-19 mathematically, the least significant bit in the internal representation of 0.004 has a place value of 2**-60=8.67...E-19. So, those 1E-19 (and even 4E-19, still less than half that place value) are rounded off, which means that an import as a numeric value should result in the clean value 0.004 automatically.

 

It's a bit surprising, though, that the numeric literal 7.0000000000000007E-2 (mathematically: 0.07+7E-18) in SAS is equal to 0.07 (on my Windows workstation, regardless of the DECIMALCONV system option setting) and not rounded up, although 7E-18 is greater than half the place value 2**-56=1.38...E-17 of the least significant bit. However, SAS correctly rounds the sum 0.07+7E-18 up and the sum 0.07+6E-18 down.

 

Your example 4.00000000001E-3 (0.004+1E-14) and the number 4.00000000000001E-3 (0.004+1E-17) mentioned by the OP are far enough apart from 0.004 so that, as you rightly suggest, explicit rounding would be required to obtain 0.004.

Tom
Super User Tom
Super User

CSV files and XLSX files are completely different things and will have different issues when being used by SAS (and by EXCEL for that matter).

 

A CSV file is a simple text file consisting of lines of text. Each line has multiple values delimited by a comma (you can change the delimiter but the concept is the same).  There is NOTHING in a CSV that describes what types of values the text on the lines represent.  The reader of the file has to decide how to interpret the text that appears.

 

An XLSX file is binary object that contains one or more worksheets. Each cell in a worksheet can contain anything.

 

A SAS dataset consists of observations and variables.  The variables must contain the same type of date (numeric or character) on every observation.  So mapping a column in a worksheet into a variable in a dataset will cause problems when a column consists of cells that contain different types of values.  When that happens SAS will be forced to define the variable is character (you can represent a number as a character string but you cannot represent every character string as a number).

 

The other issue that PROC IMPORT will have with XSLX files is with empty columns.  If every cell in the column is empty then SAS will create a character variable of length $1.  (It takes less space to store and a numeric variable).

 

For a CSV file you can have complete control over how SAS reads the file by just writing your own data step.  But for an XSLX file you cannot. 

DanFleming
Calcite | Level 5
I prefer to stick with xlsx files but converted to csv to see if that fixed the issues. It did not. One thing I am noticing is that a value in excel of 0.004 is being converted to 4.00000000000001E-3. Most of my numeric columns have an occurrence of that. Does that help to better understand the issue?
Tom
Super User Tom
Super User

@DanFleming wrote:
I prefer to stick with xlsx files but converted to csv to see if that fixed the issues. It did not. One thing I am noticing is that a value in excel of 0.004 is being converted to 4.00000000000001E-3. Most of my numeric columns have an occurrence of that. Does that help to better understand the issue?

That is just normal floating point precision issues.  You cannot represent the decimal value 0.004 exactly using floating point binary numbers.  Excel and SAS might not use the exact same floating point representations, hence you can see these types of trivial differences.

 

You can expect to find that type of issue with any work using fractional values.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 345 views
  • 0 likes
  • 5 in conversation