Please show what you did.
Note that for NUMERIC variables the data step actually always works with the full 8 byte floating point values. It is only when the value is written to the disk that the LENGTH attribute comes into effect. That is why you can change the length of a numeric variable even after the variable has been "seen" by the data step compiler.
I don't quite understand what you mean. SAS doesn't care if we specify the length beforehand to prevent issues? I just did something simple: acres is a numeric 8 variable but in comma12 format (and showing values greater than 8 spaces). So how can it be displayed with 12 spaces if its raw value only contains 8? If I change the length to 4, nothing changes in the output even though the attribute shows 4. So is length only used for character variables?
data temp; length acres 4.; set '/home/u49936438/EPG1V2/data/np_final.sas7bdat'; run;
Alternatively, if I create a new variable, changing length still has no impact on output:
data temp; length new 4.; set '/home/u49936438/EPG1V2/data/np_final.sas7bdat'; new = 23676756;
Your mental model of what is happening is not right. LENGTH determines how many bytes the values take when stored in the dataset. Formats convert values to text, they have no impact on how the values are stored, just how they are displayed. Informats convert text to values, they have no impact on how the values are stored, just how they are created from the source text.
Another side effect of confusing LENGTH and format or informat width is the inclusion of the period in your LENGTH statement. SAS uses periods in FORMAT or INFORMAT specifications to allow the parser to distinguish variable names from format or informat specifications. But the LENGTH of a variable can only be an integer, so there is no need to add a period in the LENGTH statement.
I might be getting confused because what's happening for character variables isn't happening for numeric. And I don't understand why. Is there any reason to change the length of a numeric? I don't see how it affects anything.
Changing the length of the numeric column changes the largest integer that column can hold. Don't worry about changing numeric columns. Just leave them alone and set them to the default 8 bytes.
Read the doc for more information.
IF you want to see a simple example of how the length can affect the storage of numbers please run this code and then examine the values.
data example; length x y 3; x = 1/3; y=123456789; run;
We would expect X to be the repeating decimal 0.3333333333333 (etc). Which normally would run into a storage precision issue around the 15th or 16th decimal place. What is the value if you print or view the set of x?
What is the value of Y? The length limits the number of bytes of computer storage a value uses. If the value doesn't fit reasonably then depending on what a program chooses to do it may round the data value to fit as closely as possible into the storage.
Moral of the story: do not use length less than 8 for numeric values that will have decimal portions.
Once upon a time computer memory was expensive so tools like length could be used to reduce the memory use for values that you "knew" how they were going to behave as far as range of values. If you were using a numeric value to hold a range of code values that never exceeded 8000 (and generally were not doing arithmetic with them) a small length was acceptable.
Search the documentation for the phrase "Length and precision of variables".
Key for numeric is the following table which explains why Y in my example has the value it does.
| Length in Bytes | Largest Integer Represented Exactly | Exponential Notation | Significant Digits Retained | 
|---|---|---|---|
| 3 | 8,192 | 213 | 3 | 
| 4 | 2,097,152 | 221 | 6 | 
| 5 | 536,870,912 | 229 | 8 | 
| 6 | 137,438,953,472 | 237 | 11 | 
| 7 | 35,184,372,088,832 | 245 | 13 | 
| 8 | 9,007,199,254,740,992 | 253 | 15 | 
Hi: 
The format specified for a variable has nothing to do with LENGTH. Think of FORMAT as how you want the number or string to appear in a report, for viewing. The INTERNAL value is what is "touched" by the LENGTH statement. 
In Programming 1, we explain that a SAS dataset is composed of a descriptor portion and a data portion. Information in the descriptor portion includes both the LENGTH (for storage purposes) and the FORMAT (for display purposes).
As an example, for internal reporting, I might want to show my numeric values with commas as thousands separators and no decimal places. However for any reports for the Accounting department for internal purposes, they want to see the currency symbol, the thousands separators and the decimal places, even if the decimals are always .00 -- they want to see that. The usefulness of a FORMAT comes into play because I can take the same value, stored internally as a double precision floating point number and I can display it on 1 report just with commas and display it on another report with currency and decimals.
There is usually not a good reason to change the length of a numeric variable. However, the only time I ever needed to do it was to save some space in storage for historical enrollment and registration files that were big...at the time, terabytes of data...not so big now, but big then. As an example, we had a GRADE variable that was inherited from a legacy mainframe system. The values for GRADE in the data were only ever 1, 2, 3 and 4. The number didn't need to be stored in 8 bytes. When the values were display, they needed to be displayed as Freshman, Sophomore, Junior, Senior. We used a format to take the numeric values and display them as the meaningful grade levels. Since we were saving space, they did not want to store character values and they didn't want to convert the numeric GRADE to character because the legacy programs were expecting GRADE to be numeric. We defined the LENGTH for GRADE to be a numeric LENGTH of 3 for storage because the smallest length you can define for numeric variables in SAS is 3 bytes. The number of bytes used for storage only mattered here because using 8 bytes for a number that had a max value of 4 was a space saving measure on the disk drive where the historical files were stored.
Just as a point of interest, on this documentation page: https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/hostwin/n04ccixfia6l2pn1f8szvttqg3hm.htm there's a quick chart which shows the largest number that can be stored in 8 bytes. It's worth taking a look at:
So the LENGTH you set for a numeric variable is going to impact the maximum value you can STORE for the variable. But the DISPLAY of the value is impacted by the FORMAT you specify in your reporting procedure.
Cynthia
I've gotten a lot of answers and it seems like in general i do not need to mess with the length of numeric variables for now. The reasoning seems complicated and it is not explained in the videos. Where I kind of also get lost is how bytes could affect format/length. But it seems like they should have at least mentioned this, which I don't think they did.
To get back to your original question: What makes you think it worked for the character variables but not for the numeric variables?
Please show the code you tried to use to change lengths of variables and explain your question and/or issues with how they worked.
Changing the length will have an impact on what is written to the dataset. Basically if you set the length of a numeric variable to less than the full 8 bytes SAS will not store the lower order bytes of the number. When you later read the dataset back in to be used those bytes will be set to binary zero. So the result is a lose of precision in the representing the numbers. If you are just storing integers the impact is minimal until you get to really short storage lengths.
How the values are display is another issue. Changing the length will not change what is displayed (unless if causes lose of precision). How the vlaue is displayed is controlled by what FORMAT is attached the variable (or specified in the PROC that is creating the printed output). So if you have the same variable in two datasets and combine them into one SAS will pick the first non-empty format that it sees to assign to the variable. You can override that by adding a FORMAT statement to your data step to change the format that is attached to the variable.
Does using put and input change the actual raw variable type? whereas format does not? I'm confused because in an example they used converted from character to numeric but then the decimals disappeared and only showed integers. how is that accurate if we lose the decimal places?
@jaliu wrote:
Does using put and input change the actual raw variable type? whereas format does not? I'm confused because in an example they used converted from character to numeric but then the decimals disappeared and only showed integers. how is that accurate if we lose the decimal places?
Formats convert values to text. Informats convert text to values.
So the PUT() function always generates text. The INPUT() function can generate either text or a number, depending on the type of informat being used.
So yes the PUT() and INPUT() functions output different things than they input.
But that does not change the fact that assigning a format to a variable does not change how the variable is stored. Just how it is displayed.
Not sure what examples you are talking about. Remember that SAS stores numbers as floating point binary numbers. They are stored using base 2 instead of base 10. Different fractions can be represented exactly by base 2 than can be represented exactly by base 10.
Hi:
I think you are talking about the example in the Programming 1 or Programming 2 course where we show how to use the INPUT() function to create a NEW numeric variable from an existing character variable. Without knowing the course or the Lesson # or Practice #, it's hard to comment on the specific assignment. I thought that the lectures immediately before the assignment laid out a business scenario for when you might need to create new variables of a different type than their original type. So the INPUT() and PUT() functions are discussed to show students how to explicitly do this kind of conversion. This is a very common need for people who receive files from different sources and a variable might have the same name, maybe something like ACCOUNT in both sources, but it was defined as character in one file and defined as numeric in another file. In order to use the two files together, one of the ACCOUNT variables will need to be changed -- if the decision is that ACCOUNT needs to be character, then the PUT function would be used to create a new variable. If the decision is that ACCOUNT needs to be numeric, then the INPUT function would be used to create a new variable. However, I believe in the lecture where we show the PUT and INPUT functions, we also talk about using the RENAME option because you can't change the type of a variable once it has been defined, so there's a bit more to do to make ACCOUNT the same type in both files.
Cynthia
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Follow along as SAS technical trainer Dominique Weatherspoon expertly answers all your questions about SAS Libraries.
Find more tutorials on the SAS Users YouTube channel.
