Data monthly.PD_score (keep=ID PD LGD Scorecard) length id $20. PD LGD 8. Scorecard $200.; format pd best12.;
Hi could you let me know what best12. means? it feels like taking best 12 spaces or digits including decimal if numeric
or just best 12 digits for character variable
what is the significance of best12. here given PD seems to be assigned format 8. in the previous line
do I always have to specify the length? can I run program without it leaving to default?
do I always have to define a variable in sas, eg whether it is string eg $8 or numeric 8?
@HeatherNewton wrote:
do I always have to specify the length? can I run program without it leaving to default?
do I always have to define a variable in sas, eg whether it is string eg $8 or numeric 8?
thanks I got it
but what does " put MyVar = l " do here?
data test; length MyVar 8; format MyVar best12.; MyVar = 12345.12345; put MyVar = ; run;
MyVar is stored in 8 bytes, but displays 10 digits when printed.
SAS is a flexible language. If you are not required to define a variable before using it. But then SAS will make its best guess what type (and length) of variable you meant it to be based on how you first use it. (or at least at the first place where it needs to define it). If you don't give it any clues it will define the variable as numeric and set the storage to 8 bytes. If you only tell it that the variable is character without any way for it to guess a length it will define it as character and also of length 8 bytes. For example in this program both NUM and CHAR will use 8 bytes of storage, but NUM will be a number and CHAR will be a character string.
data test;
input num char $;
cards;
1 high
2 low
;
Here is an example where not defining the variable can cause trouble.
data test;
input id age ;
if age > 65 then age_group='old';
else age_group='young';
cards;
1 20
2 66
;
Since you never defined AGE_GROUP before using it in the assignment statement it will be defined as character of length only 3 bytes because the string constant you are assigning the variable is only three bytes long. Then the value 'young' will be truncated to 'you' since there is no room for the other two bytes.
The = in that PUT statement means to not only write the name of the variable before the value. So
put myvar= ;
will write
MyVar= 12345.12345
Which is only needs 11 of the 12 bytes allocated by the BEST12. format. The decimal point counts in the width of a numeric format.
Data final; length region $10. product $20. sub_product $20. segment $40. sub_segment $50. segment_id $12. cv_95 8. CV_accurancy_95 $6.;
this is what I see in my code, it does look like the no of digits each var requires is specified.. otherwise why bother specify each.. if length is just storage and format is not specified, what no of digits would each variable get in the above case?
data testlength; informat FirstName LastName $15. n1 6.2; input firstname lastname n1 n2; length name $25 default=4; name=trim(lastname)||', '||firstname; datalines; Alexander Robinson 35 11 ; proc contents data=testlength; run; proc print data=testlength; run;
in the code example above, charachers like name defined as length $25 means it is 25 bytes? also means 25 spaces on screen? for character, length also means no of spaces on screen? but for numeric, it does not and it means bytes? and we have to define format as e.g. decimal(10,2) meaning atmost 10 digits including 2 decimals and the decimal itself...
length name $25 default=4;
The storage length of name is set to 25 bytes. This does not mean that name will always use 25 spaces on screen:
Additionally, this LENGTH statement sets the storage length for all newly created numeric variables to 4 (unless a variable is explicitly defined with another length). This is a suitable length for dates, but not good for larger integer values or values with fractions.
How a number is displayed will be determined by the format used. If no format is specified, SAS will use BEST12.
The ODS destination used will have a major impact on the look of your PROC PRINT result. ODS HTML will be very dynamic, while ODS LISTING will use a non-proportional font and fixed column lengths according to the defined length or the display format.
Hi,
Have you tried looking at the best format documentation?
The number at the end of the format indicates the width the value is to occupy when displayed, which would be 12 places in this case.
The previous line where the "8" appears is part of a length statement, which is how many bytes are used to hold the value internally, it is distinct to the format statement on the next line.
HTH.
Kind regards,
Amir.
Hi,
Also, when specifying a length there is no need to place a period (".") at the end of the length value. Perhaps this is where part of the confusion comes from(?)
Kind regards,
Amir.
The code will throw a syntax error because of a missing semicolon for the DATA statement.
Once that is fixed, start by making code readable (this is what I always do when I get code from other people with crappy coding habits).
data monthly.PD_score (keep=ID PD LGD Scorecard);
length
ID $20
PD 8
LGD 8
Scorecard $200
;
format
PD best12.
;
Give each definition its own line, this allows to easily move it around in your code by moving complete lines. As already recommended by others, omit the dots in the LENGTH statement.
Also be consistent with naming. Note that I used the same case for a variable in all places where it is used.
so if assign 8 to PD, it means 8 bytes does it also mean it is 8 spaces on the screen? what is the significance of 8 bytes? does that mean if it is numeric, there is no decimals, as I have seem others codes like 8.4 which I thought 4 means decimal places so max what we can display for 8.4 is
1234.678
as the decimal also take up one space
am I correct?
You are confusing how a variable is stored with how it is displayed. 8 bytes refers to the storage space on disk and in memory and is nothing to do with the number of digits that are displayed on screen. Run a simple test to prove it for yourself:
data test;
length MyVar 8;
format MyVar best12.;
MyVar = 12345.12345;
put MyVar = ;
run;
MyVar is stored in 8 bytes, but displays 10 digits when printed.
thanks I got it
but what does " put MyVar = l " do here?
data test;
length MyVar 8;
format MyVar best12.;
MyVar = 12345.12345;
put MyVar = ;
run;
MyVar is stored in 8 bytes, but displays 10 digits when printed.
data test;
length MyVar 8;
format MyVar best12.;
MyVar = 12345.12345;
put MyVar = ;
run;
MyVar is stored in 8 bytes, but displays 10 digits when printed.
No.
The FORMAT used to display a value has nothing to do with what the value is or how it is stored. It is just the instructions for how to display the value as text.
The storage length for a numeric variable specifies how many bytes of the 8 bytes required to represent a number using floating binary are saved to the disk. You can use fewer bytes to store the number (as low as 3 bytes on Unix/Windows machines), but that just means you lose those bits of precision on the value that is retrieved when the dataset is read back.
It is probably easier to see how the display is independent of the storage is to look at how SAS handles DATE values. What it stores in the data is the number of days since 1960. But depending on which format you use to display the date value it can be printed in many different ways. So today is day number 22,719.
%put %sysfunc(today(),comma12.); 22,719
Which you can display as 15MAR2022 or 15-MAR-2022 or 2022-03-15 or 15-03-2022 or 03-15-2022 or many other ways by using different display formats.
You have to be careful not to get confused by computer languages using words like FORMAT and LENGTH that look like English words into thinking that the meaning of those terms in the computer language is exactly the same as it is in English. Or in any other computer language you might be used to. The terms in a language have the meaning that the language defines for them.
A FORMAT converts values into text. The BEST12. format specification says to display a numeric value using 12 characters. The BEST format will attempt to determine the "best" way to display that particular value in 12 characters. So a larger value might use scientific notation. An integer value would omit the decimal point. etc.
The LENGTH statement determine the storage length used when writing the observations into the SAS dataset. It does not determine how the values will be displayed. There is no need to include a period in the lengths since they can only be integers.
Since SAS uses 64-bit floating point numbers you should generally always use 8 as the length for numbers. The $ indicates that the variable is character. Since SAS stores character strings as fixed length the storage length is the number of bytes that the variable can contain. If you are using a single byte encoding, like WLATIN1, then the number of bytes of storage matches the number of characters that can be stored. But if you are using an encoding like UTF-8 where some characters require more than one byte the number of characters that will fit in the storage length depends on the which characters they are.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.