BookmarkSubscribeRSS Feed
Emma_at_SAS
Lapis Lazuli | Level 10

I have hospital data with the price each patient has paid for visiting the emergency department. When I want to combine the data sets from different hospitals, I get this WARNING message in my SAS log

WARNING: Multiple lengths were specified for the variable ER_EXPENSE

but my combined data looks fine for the variable ER_EXPENSE

I tried the following example to see how SAS handles non-matching LENGTH for numeric variables and it looks fine in my example but I do not get that WARNING message in my example. Any thoughts? 

data test1;
input ER_EXPENSE 8.2;
datalines;
12345.78
11111.11
;
run;
data test2;
input ER_EXPENSE 4.0;
datalines;
1234
1111
;
run;
data combined;
format ER_EXPENSE 12.2;
set test1 test2;
run;

 

thanks

8 REPLIES 8
ballardw
Super User

You did not specify a length for any of the variables so they all default to 8. Therefore no warning.

 

You may want to run this and look a the results of the data where LENGTH is specified:

data test1;
  length er_expense 3;
input ER_EXPENSE 8.2;
datalines;
12345.78
11111.11
;

data test2;
  length er_expense 4;
input ER_EXPENSE 8.2;
datalines;
12345.78
11111.11
;

data test3;
  length er_expense 5;
input ER_EXPENSE 8.2;
datalines;
12345.78
11111.11
;

data testlast;
  length er_expense 8;
input ER_EXPENSE 8.2;
datalines;
12345.78
11111.11
;

General rule with SAS: if a value is not an integer do not specify a length. The length limits the number of bytes the value is stored in resulting in some pretty odd things even worse than the normal 14/15 significant digits of precision available. You may need to trace the history of one or more data sets back to where it was brought into SAS and change the process to stop setting the length less than 8.

 

You can reduce possible data content issues by specifying a length for the variable(s) before anything combining the data and remove the warning:

data combined;
   length er_expense 8;
   set test1 test2 test3 testlast;
run;

Note: this combined data set will still show different values for the variable because as READ in initially they could not hold all the information and values were corrupted when read.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Emma_at_SAS 

 

A numeric variable in SAS is always a floating point number. The physical variable can be anything from 3-8 bytes, and the number of bytes indicates hov many significant digits there can be stored.

Se ex. this link: https://www.listendata.com/2016/12/sas-length-of-numeric-variables.html

 

In your test you create two datasets with a numeric variable, but you haven't specified a length, so the variables are created with the default length 8, so thwy will append without warning. But is SAS says "Differenth lengths ...", your real input has variables of different length. 

 

When working with large datasets, variables might by created shorter than 8 bytes to conserve disk space, if the programmer knows for certain that all values can be represented in the shorter length. I think it can also happen when SAS datasets are imported from databases where columns are defined shorter, e.g. as TINYINT instead of DOUBLE or FLOAT, but I am not sure, I haven't tested it.

 

But run a Proc Contents on your input and see the actual length. And then preprocess input, e.g. by creating SQL views with lengths defined as 8, this ought to solve your problem.

Emma_at_SAS
Lapis Lazuli | Level 10

Thank you @ballardw  and @ErikLund_Jensen  for sharing your experience. I used to think the 8.2 in the INPUT statement defines the length. 

In my real data, I cannot go back to an earlier version of the data sets I got from the hospitals. Is there a safe length I can define when I combine the datasets, or may I ignore the SAS warning and my combined data will be fine?

 

Thanks

Kurt_Bremser
Super User

Always set the length of numeric variables to 8 (which is the maximum and the default).

The informat 8.2 only defines how data is read (or converted from a character string) and has no effect on the storage length.

Emma_at_SAS
Lapis Lazuli | Level 10

Thank you both for your help. All your responses together helped me to solve my problem. Now, I am not sure which one to select as the solution to be helpful to future SAS users. Please let me know if you have any thoughts on that. Thanks

Tom
Super User Tom
Super User

Why are you telling SAS to divide strings without an explicit decimal point by 100?

 

That is what the .2 in your INFORMAT format value of 8.2 means.

 

So if you read the line 

10025

using 8.2 informat the result is 100.25 and not 10,025.

 

The only reason to ever specify a number of decimal places on an informat is when you are reading values that were purposely written without the decimal place to save an extra character.

Emma_at_SAS
Lapis Lazuli | Level 10

Thank you very much @Tom  for mentioning this point about using informat. I have to review some resources to learn how SAS handles variables and the use of informat and length to make sure I do not change my data unintentionally. Thanks!

Tom
Super User Tom
Super User

@Emma_at_SAS wrote:

Thank you very much @Tom  for mentioning this point about using informat. I have to review some resources to learn how SAS handles variables and the use of informat and length to make sure I do not change my data unintentionally. Thanks!


Just remember that if you are not setting the LENGTH then you are not directly defining the variable. Instead you are letting SAS make a guess at how you intended the variable to be defined by how you are first using it.  So if the first usage is in a FORMAT or INFORMAT statement SAS will base its guess on the format or informat being attached to the variable.  SAS will also base it on the informat used when the the first use is in an INPUT  statement.  If the first use is in an assignment statement then it will try to guess base on the role it plays there.  So if you first reference a variable by assigning  or comparing it a constant value it will guess based on the constant.  If you assign it the value of a variable it will guess to make it match the type and length of that variable.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1714 views
  • 7 likes
  • 5 in conversation