BookmarkSubscribeRSS Feed
HeatherNewton
Quartz | Level 8
Data monthly.PD_score (keep=ID PD LGD Scorecard)
length id $20. PD LGD 8. Scorecard $200.;
format pd best12.;

Hi could you let me know what best12. means? it feels like taking best 12 spaces or digits including decimal if numeric

or just best 12 digits for character variable

 

what is the significance of best12. here given PD seems to be assigned format 8. in the previous line

16 REPLIES 16
JosvanderVelden
SAS Super FREQ
The number 8 in the length statement specifies a numeric constant for storing variable values. In this case it specifies to PD and LGD in 8 bytes. The BESTw. format, used in the format statement, attempts to write numbers that balance the conflicting requirements of readability, precision, and brevity. See documentation for details:

best.-format documentation: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/leforinforref/p1fum54c93f8r0n1wrs5mrb05nzi.ht...

length statement documentation: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/p1hgqgmxm3dpqcn1d4w5za5qbz0d.htm

Best regards, Jos
HeatherNewton
Quartz | Level 8

do I always have to specify the length? can I run program without it leaving to default?

do I always have to define a variable in sas, eg whether it is string eg $8 or numeric 8?

Tom
Super User Tom
Super User

@HeatherNewton wrote:

do I always have to specify the length? can I run program without it leaving to default?

do I always have to define a variable in sas, eg whether it is string eg $8 or numeric 8?

 

thanks I got it

but what does " put MyVar = l " do here?

data test;
  length MyVar 8;
  format MyVar best12.;
  MyVar = 12345.12345;
  put MyVar = ;
run;

MyVar is stored in 8 bytes, but displays 10 digits when printed.


SAS is a flexible language.  If you are not required to define a variable before using it.  But then SAS will make its best guess what type (and length) of variable you meant it to be based on how you first use it. (or at least at the first place where it needs to define it).   If you don't give it any clues it will define the variable as numeric and set the storage to 8 bytes.  If you only tell it that the variable is character without any way for it to guess a length it will define it as character and also of length 8 bytes.  For example in this program both NUM and CHAR will use 8 bytes of storage, but NUM will be a number and CHAR will be a character string.

data test;
  input num char $;
cards;
1 high
2 low
;

Here is an example where not defining the variable can cause trouble.

data test;
  input id age ;
  if age > 65 then age_group='old';
  else age_group='young';
cards;
1 20
2 66
;

Since you never defined AGE_GROUP before using it in the assignment statement it will be defined as character of length only 3 bytes because the string constant you are assigning the variable is only three bytes long.  Then the value 'young' will be truncated to 'you' since there is no room for the other two bytes.

 

The = in that PUT statement means to not only write the name of the variable before the value.  So

put myvar= ;

will write 

MyVar= 12345.12345

Which is only needs 11 of the 12 bytes allocated by the BEST12. format.  The decimal point counts in the width of a numeric format.

HeatherNewton
Quartz | Level 8
Data final;
length
region $10.
product $20.
sub_product $20.
segment $40.
sub_segment $50.
segment_id $12.
cv_95 8. CV_accurancy_95 $6.;

this is what I see in my code, it does look like the no of digits each var requires is specified.. otherwise why bother specify each.. if length is just storage and format is not specified, what no of digits would each variable get in the above case?

HeatherNewton
Quartz | Level 8
data testlength;
   informat FirstName LastName $15. n1 6.2;
   input firstname lastname n1 n2;
   length name $25 default=4;
   name=trim(lastname)||', '||firstname;
   datalines;
Alexander Robinson 35 11
;
proc contents data=testlength;
run;
proc print data=testlength;
run;

in the code example above, charachers like name defined as length $25 means it is 25 bytes? also means 25 spaces on screen? for character, length also means no of spaces on screen? but for numeric, it does not and it means bytes? and we have to define format as e.g. decimal(10,2) meaning atmost 10 digits including 2 decimals and the decimal itself...

 

Kurt_Bremser
Super User
length name $25 default=4;

The storage length of name is set to 25 bytes. This does not mean that name will always use 25 spaces on screen:

  • You may have UTF data, where one displayed character uses up to 4 bytes of storage
  • A format may reduce the characters shown
  • Display in HTML or similar will automatically adapt the column to the widest value

Additionally, this LENGTH statement sets the storage length for all newly created numeric variables to 4 (unless a variable is explicitly defined with another length). This is a suitable length for dates, but not good for larger integer values or values with fractions.

How a number is displayed will be determined by the format used. If no format is specified, SAS will use BEST12.

 

The ODS destination used will have a major impact on the look of your PROC PRINT result. ODS HTML will be very dynamic, while ODS LISTING will use a non-proportional font and fixed column lengths according to the defined length or the display format.

Amir
PROC Star

Hi,

 

Have you tried looking at the best format documentation?

 

The number at the end of the format indicates the width the value is to occupy when displayed, which would be 12 places in this case.

 

The previous line where the "8" appears is part of a length statement, which is how many bytes are used to hold the value internally, it is distinct to the format statement on the next line.

 

HTH.

 

 

Kind regards,

Amir.

Amir
PROC Star

Hi,

 

Also, when specifying a length there is no need to place a period (".") at the end of the length value. Perhaps this is where part of the confusion comes from(?)

 

 

Kind regards,

Amir.

Kurt_Bremser
Super User

The code will throw a syntax error because of a missing semicolon for the DATA statement.

Once that is fixed, start by making code readable (this is what I always do when I get code from other people with crappy coding habits).

data monthly.PD_score (keep=ID PD LGD Scorecard);
length
  ID $20
  PD 8
  LGD 8
  Scorecard $200
;
format
  PD best12.
;

Give each definition its own line, this allows to easily move it around in your code by moving complete lines. As already recommended by others, omit the dots in the LENGTH statement.

Also be consistent with naming. Note that I used the same case for a variable in all places where it is used.

HeatherNewton
Quartz | Level 8

so if assign 8 to PD, it means 8 bytes does it also mean it is 8 spaces on the screen? what is the significance of 8 bytes? does that mean if it is numeric, there is no decimals, as I have seem others codes like 8.4 which I thought 4 means decimal places so max what we can display for 8.4 is

1234.678

as the decimal also take up one space

am I correct?

SASKiwi
PROC Star

You are confusing how a variable is stored with how it is displayed. 8 bytes refers to the storage space on disk and in memory and is nothing to do with the number of digits that are displayed on screen. Run a simple test to prove it for yourself:

data test;
  length MyVar 8;
  format MyVar best12.;
  MyVar = 12345.12345;
  put MyVar = ;
run;

MyVar is stored in 8 bytes, but displays 10 digits when printed.

 

HeatherNewton
Quartz | Level 8

thanks I got it

but what does " put MyVar = l " do here?

data test;
  length MyVar 8;
  format MyVar best12.;
  MyVar = 12345.12345;
  put MyVar = ;
run;

MyVar is stored in 8 bytes, but displays 10 digits when printed.

 

0 Likes
 
 
 
data test;
  length MyVar 8;
  format MyVar best12.;
  MyVar = 12345.12345;
  put MyVar = ;
run;

MyVar is stored in 8 bytes, but displays 10 digits when printed.

 

0 Likes
 
 
 
Tom
Super User Tom
Super User

No. 

 

The FORMAT used to display a value has nothing to do with what the value is or how it is stored. It is just the instructions for how to display the value as text.  

 

The storage length for a numeric variable specifies how many bytes of the 8 bytes required to represent a number using floating binary are saved to the disk.  You can use fewer bytes to store the number (as low as 3 bytes on Unix/Windows machines), but that just means you lose those bits of precision on the value that is retrieved when the dataset is read back. 

 

It is probably easier to see how the display is independent of the storage is to look at how SAS handles DATE values.  What it stores in the data is the number of days since 1960.  But depending on which format you use to display the date value it can be printed in many different ways.  So today is day number 22,719.

%put %sysfunc(today(),comma12.);
 22,719

Which you can display as 15MAR2022 or 15-MAR-2022 or 2022-03-15 or 15-03-2022 or 03-15-2022 or many other ways by using different display formats.

 

You have to be careful not to get confused by computer languages using words like FORMAT and LENGTH that look like English words into thinking that the meaning of those terms in the computer language is exactly the same as it is in English.  Or in any other computer language you might be used to.  The terms in a language have the meaning that the language defines for them.

Tom
Super User Tom
Super User

A FORMAT converts values into text.  The BEST12. format specification says to display a numeric value using 12 characters.  The BEST format will attempt to determine the "best" way to display that particular value in 12 characters.  So a larger value might use scientific notation.  An integer value would omit the decimal point. etc.

 

The LENGTH statement determine the storage length used when writing the observations into the SAS dataset.  It does not determine how the values will be displayed. There is no need to include a period in the lengths since they can only be integers.

 

Since SAS uses 64-bit floating point numbers you should generally always use 8 as the length for numbers.  The $ indicates that the variable is character.  Since SAS stores character strings as fixed length the storage length is the number of bytes that the variable can contain.  If you are using a single byte encoding, like WLATIN1, then the number of bytes of storage matches the number of characters that can be stored.  But if you are using an encoding like UTF-8 where some characters require more than one byte the number of characters that will fit in the storage length depends on the which characters they are.

sas-innovate-white.png

🚨 Early Bird Rate Extended!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Lock in the best rate now before the price increases on April 1.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 14146 views
  • 2 likes
  • 7 in conversation