BookmarkSubscribeRSS Feed
mdoddala
Obsidian | Level 7
data tom;
length ID $ 3 Name $ 15;
input ID $ Score1-Score3 Name $;
datalines;
1 90 95 98 Ram
2 78 77 75 Tom
3 88 91 92 Sham
;
proc print noobs;
proc contents data=tom;
run;
16 REPLIES 16
Tom
Super User Tom
Super User

Because SAS groups the numeric and character variables in the observation and aligns them to start on a multiple of 8 bytes.

 

Since the total length of your character variables is between 16 and 24 bytes they end up taking 24 bytes to allow for the alignment.

 

Change the length of name to $13 from $15 and you will see that the observation length drops from 48 to 40.

kiranv_
Rhodochrosite | Level 12

@Tom thank you very much for very clear explanation. 

mdoddala
Obsidian | Level 7
@Tom could you please explain it more clearly? I’m not able to interpret what you said. Thank you:)
Tom
Super User Tom
Super User

@mdoddala wrote:
@Tom could you please explain it more clearly? I’m not able to interpret what you said. Thank you:)

You can write the PROC CONTENTS output to a dataset and look at the NPOS variable.

proc contents data=all noprint out=contents; run;

proc print;
 var name varnum npos type length ;
run;

Result:

Obs    NAME      VARNUM    NPOS    TYPE    LENGTH

 1     ID           1       24       2        3
 2     Name         2       27       2       15
 3     Score1       3        0       1        8
 4     Score2       4        8       1        8
 5     Score3       5       16       1        8

So looking at that it appears that the real explanation is that the Observation Length is always a multiple of 8 bytes. 

 

The numeric variables appear first and then the character variables, even though the VARNUM order is different.

 

Try different combinations of variables and see how SAS modifies the NPOS and Observation Length.

While you are at it try using some numeric variables of length between 3 and 8.

 

 

 

subhroster
Fluorite | Level 6

Hi @Tom ,

 

Could you please explain the use of NPOS column? 

When I checked the SAS documentation it says NPOS is the physical position in each column. I am wondering why the NPOS value of score1 variable is 0?

Subhro Kar
www.9to5sas.com
andreas_lds
Jade | Level 19

@subhroster wrote:

Hi @Tom ,

 

Could you please explain the use of NPOS column? 

When I checked the SAS documentation it says NPOS is the physical position in each column. I am wondering why the NPOS value of score1 variable is 0?


Because all counting starts at 0.

Kurt_Bremser
Super User

The physical position is counted in computing-typical way, starting at zero. That's why the second variable starts at 8, and not at 9.

Address of variable within a dataset page = starting address of observation + npos. This is, of course, different from the way string positions are handled in SAS code, where the first position is defined as 1.

You need to see the distinction between SAS language, which is heavily targeted towards non-programmer users and "everyday thinking", and internal values for technical means, where the values as used in CPU registers are important.

mdoddala
Obsidian | Level 7
@Tom you said the length of your character variables is 16-24 but we have only one character variable there called Name and it should occupy only 8 bytes where as Id, score 1, score 2 and score 3 are numeric variables.
kiranv_
Rhodochrosite | Level 12

let me explain on behalf of @Tom, please feel to correct me sir, if I am wrong.

 

name is 15 btyes + 3 bytes for id this makes 18 bytes

 

but whenever you have character variables along with numeric variables in row they need to store in 8 bytes

 

so even though you had 18 bytes it does not fit in 8+ 8 which is 16, so it goes for 24 by taking extra 6 bytes for those 2 variables

 

outcome of above sentence: 24(18 is not multipler of 8 so it goes for 24) +24(bytes for score1 to score3 i.e. 3X8) comes to 48

 

if you make name as  13 bytes(length of13) + 3 bytes for id this makes 16 bytes and this is enough as it fits rule for every 8 byte for character variables

 

outcome of above sentence:  16 +24(bytes for score1 to score3 i.e. 3X8) comes to 40

 

hope this is clear

mdoddala
Obsidian | Level 7
@novinosrin Could you help me with this?
novinosrin
Tourmaline | Level 20

Hi @mdoddala   Sorry for the late response as I was away from my pc.

Ok, I honestly do think your question is excellent and you beat me on this. I just read a SAS doc-

"Observations within a SAS data set are aligned on double-byte boundaries whenever possible. As a result, 8-byte and 4-byte numeric variables are positioned at 8-byte boundaries at the front of the data set and followed by character variables in the order in which they are encountered. If the data set only contains 4-byte numeric data, the alignment is based on 4-byte boundaries. Since numeric doubles can be operated upon directly rather than being moved and aligned before doing comparisons or increments, the boundaries cause better performance." Ref: SAS datasets doc

 

but I'm not convinced as it really needs strong understanding of data structure alignment  and  formulas thar provide the number of padding bytes 

 

  • A long (eight bytes) will be 8-byte aligned.
  • A double (eight bytes) will be 8-byte aligned.
  • An int (four bytes) will be 4-byte aligned

 

I'll dig in on this very concept later inside out and will share my understanding and on the other hand, if you happen to learn faster , kindly share your knowledge.  Great question indeed & thank you

mdoddala
Obsidian | Level 7

Hi @novinosrin, so what I understood is that numeric variables have a default length of 8 bytes where as the length of character variables is set in the program. If you take a look at the program, we had the length of ID as 3, length of name as 15, length of score1 as 8, similarly for score2 and score3 8 each which gives us the total=3+15+8+8+8=42 bytes, but the observation length is always shown as the multiples of 8 in the results so it upgrades 42 to 48 bytes in the results tab. Please let me know if I'm wrong in case. Thank you so much man for being very responsive. In the last few days, I got really lazy to learn sas programming;)  

mdoddala
Obsidian | Level 7
But the Id in input statement overrides the 3 bytes in the length statement with 8 bytes
kiranv_
Rhodochrosite | Level 12

15 + 3 = 18

so it cannot fit in multiplles of 8. as it is left out with 2 more bytes it has to go 24. you need think in terms of row. how a row will be organized and what are rules for organizing a row. it changes when you add numeric value to your row. run the below example.


/* observation length is 10 just adds up when you have only character variables*/

data a;
length h i $5;
h = "kj";
i= "jj";
run;


proc contents;


/* length of observation is 24 see the impact of having character 

 variable along with numeric variable in a row*/

 
data a;
length h i $5;
h = "kj";
i= "jj";
c= 8;
run;


/*once you added numeric variable to your row you need to arrange
character variable in 8 bytes, so our character variables
which are 10 bytes does not fit so has to be extended to 16
which along with numeric variable that is + 8 bytes becomes 24 bytes*/

 proc contents;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 4254 views
  • 12 likes
  • 7 in conversation