data tom;
length ID $ 3 Name $ 15;
input ID $ Score1-Score3 Name $;
datalines;
1 90 95 98 Ram
2 78 77 75 Tom
3 88 91 92 Sham
;
proc print noobs;
proc contents data=tom;
run;
Because SAS groups the numeric and character variables in the observation and aligns them to start on a multiple of 8 bytes.
Since the total length of your character variables is between 16 and 24 bytes they end up taking 24 bytes to allow for the alignment.
Change the length of name to $13 from $15 and you will see that the observation length drops from 48 to 40.
@Tom thank you very much for very clear explanation.
@mdoddala wrote:
@Tom could you please explain it more clearly? I’m not able to interpret what you said. Thank you:)
You can write the PROC CONTENTS output to a dataset and look at the NPOS variable.
proc contents data=all noprint out=contents; run;
proc print;
var name varnum npos type length ;
run;
Result:
Obs NAME VARNUM NPOS TYPE LENGTH 1 ID 1 24 2 3 2 Name 2 27 2 15 3 Score1 3 0 1 8 4 Score2 4 8 1 8 5 Score3 5 16 1 8
So looking at that it appears that the real explanation is that the Observation Length is always a multiple of 8 bytes.
The numeric variables appear first and then the character variables, even though the VARNUM order is different.
Try different combinations of variables and see how SAS modifies the NPOS and Observation Length.
While you are at it try using some numeric variables of length between 3 and 8.
Hi @Tom ,
Could you please explain the use of NPOS column?
When I checked the SAS documentation it says NPOS is the physical position in each column. I am wondering why the NPOS value of score1 variable is 0?
@subhroster wrote:
Hi @Tom ,
Could you please explain the use of NPOS column?
When I checked the SAS documentation it says NPOS is the physical position in each column. I am wondering why the NPOS value of score1 variable is 0?
Because all counting starts at 0.
The physical position is counted in computing-typical way, starting at zero. That's why the second variable starts at 8, and not at 9.
Address of variable within a dataset page = starting address of observation + npos. This is, of course, different from the way string positions are handled in SAS code, where the first position is defined as 1.
You need to see the distinction between SAS language, which is heavily targeted towards non-programmer users and "everyday thinking", and internal values for technical means, where the values as used in CPU registers are important.
let me explain on behalf of @Tom, please feel to correct me sir, if I am wrong.
name is 15 btyes + 3 bytes for id this makes 18 bytes
but whenever you have character variables along with numeric variables in row they need to store in 8 bytes
so even though you had 18 bytes it does not fit in 8+ 8 which is 16, so it goes for 24 by taking extra 6 bytes for those 2 variables
outcome of above sentence: 24(18 is not multipler of 8 so it goes for 24) +24(bytes for score1 to score3 i.e. 3X8) comes to 48
if you make name as 13 bytes(length of13) + 3 bytes for id this makes 16 bytes and this is enough as it fits rule for every 8 byte for character variables
outcome of above sentence: 16 +24(bytes for score1 to score3 i.e. 3X8) comes to 40
hope this is clear
Hi @mdoddala Sorry for the late response as I was away from my pc.
Ok, I honestly do think your question is excellent and you beat me on this. I just read a SAS doc-
"Observations within a SAS data set are aligned on double-byte boundaries whenever possible. As a result, 8-byte and 4-byte numeric variables are positioned at 8-byte boundaries at the front of the data set and followed by character variables in the order in which they are encountered. If the data set only contains 4-byte numeric data, the alignment is based on 4-byte boundaries. Since numeric doubles can be operated upon directly rather than being moved and aligned before doing comparisons or increments, the boundaries cause better performance." Ref: SAS datasets doc
but I'm not convinced as it really needs strong understanding of data structure alignment and formulas thar provide the number of padding bytes
I'll dig in on this very concept later inside out and will share my understanding and on the other hand, if you happen to learn faster , kindly share your knowledge. Great question indeed & thank you
Hi @novinosrin, so what I understood is that numeric variables have a default length of 8 bytes where as the length of character variables is set in the program. If you take a look at the program, we had the length of ID as 3, length of name as 15, length of score1 as 8, similarly for score2 and score3 8 each which gives us the total=3+15+8+8+8=42 bytes, but the observation length is always shown as the multiples of 8 in the results so it upgrades 42 to 48 bytes in the results tab. Please let me know if I'm wrong in case. Thank you so much man for being very responsive. In the last few days, I got really lazy to learn sas programming;)
15 + 3 = 18
so it cannot fit in multiplles of 8. as it is left out with 2 more bytes it has to go 24. you need think in terms of row. how a row will be organized and what are rules for organizing a row. it changes when you add numeric value to your row. run the below example.
/* observation length is 10 just adds up when you have only character variables*/
data a;
length h i $5;
h = "kj";
i= "jj";
run;
proc contents;
/* length of observation is 24 see the impact of having character
variable along with numeric variable in a row*/
data a;
length h i $5;
h = "kj";
i= "jj";
c= 8;
run;
/*once you added numeric variable to your row you need to arrange
character variable in 8 bytes, so our character variables
which are 10 bytes does not fit so has to be extended to 16
which along with numeric variable that is + 8 bytes becomes 24 bytes*/
proc contents;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.