Hello,
Does anyone know the length needed for a numeric value that is one of the special missing values, .A - .Z? Integers between -8192 and 8192 can be stored in a numeric variable that is only length 3. Special missing values appear to work fine with length = 3, too, but I want to get a definitive answer.
Warm regards,
Michael
They work fine. Try it yourself.
data test;
length x3 3 x4 4 x5 5 x6 6 x7 7 x8 8;
array x x3-x8;
do y=.,.z,.a,._ ;
do over x;
x=y;
end;
output;
end;
run;
data _null_;
set test;
put (_numeric_) (=);
run;
x3=. x4=. x5=. x6=. x7=. x8=. y=. x3=Z x4=Z x5=Z x6=Z x7=Z x8=Z y=Z x3=A x4=A x5=A x6=A x7=A x8=A y=A x3=_ x4=_ x5=_ x6=_ x7=_ x8=_ y=_ NOTE: There were 4 observations read from the data set WORK.TEST.
3 is the minimum. Unless you're on an IBM mainframe, when I think it's 2!
They work fine. Try it yourself.
data test;
length x3 3 x4 4 x5 5 x6 6 x7 7 x8 8;
array x x3-x8;
do y=.,.z,.a,._ ;
do over x;
x=y;
end;
output;
end;
run;
data _null_;
set test;
put (_numeric_) (=);
run;
x3=. x4=. x5=. x6=. x7=. x8=. y=. x3=Z x4=Z x5=Z x6=Z x7=Z x8=Z y=Z x3=A x4=A x5=A x6=A x7=A x8=A y=A x3=_ x4=_ x5=_ x6=_ x7=_ x8=_ y=_ NOTE: There were 4 observations read from the data set WORK.TEST.
I think that @Kastchei knows that SAS can save the special missing values in 3 bytes. As I read it, he wants a formal answer on how many or how the data is stored.
I haven't seen any documentation about exactly how they are stored.
SAS has documentation on how many bytes is needed to store an integer, Maximum Integer Size, but that does not mention how they store special missing values. On the same page they mention the special missing values, but they don't mention the storing of them there.
So, I think there might not be an official answer. Instead we just have to hope that it works with 3 bytes and that they don't change anything in the future. 🙂
I was asserting that it wasn't anything to worry about. From memory, they're stored in the length assigned, but only take up three bytes. Any length longer will just pad the extra with '00'x. That's in Windows - other operating systems will do it differently - big- and little-endian, and that sort of thing.
SAS wouldn't dare change it, because all the code in the world would break!
If your objective is to conserve disk space, I suspect you will get get better mileage by compressing your SAS data. Setting a SAS default of COMPRESS = YES or COMPRESS = BINARY in all SAS sessions is now quite a common practice. Then you don't need to worry about optimising variable lengths at all.
Yeah, most of the time I don't worry about it and just use compression. But changing the length is more saving than compression (sometimes both help), so on really big datasets, I try to use both.
Good to hear you are using compression. My personal approach is that it is cheaper and quicker to not bother with changing numeric variable lengths as disk space is plentiful and nowadays relatively cheap.
I generally concur, but not completely. And I say this as a big fan of compression.
If the PDV is predominantly numeric (especially with few missing values), or otherwise contains short fully populated character variables, compression can well increase the size of the dataset. The extra I/Os required to read the larger-than-necessary dataset, with the extra cost of decompressing, can thus extend the run-time considerably.
My rule of thumb is: if compression is less than 30%, don't bother.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.