DATA Step, Macro, Functions and more

Bitwise function use case

Reply
Frequent Contributor
Posts: 112

Bitwise function use case

Not a question but rather a curious observation. In the recent thread "removing duplicates" (a misnomer since it has nothing with removing "duplicates"), @leahcho asked how to turn this data:

 

ID A B C

1  1  0  0
1  0  1  0
1  1  1  0
2  0  1  1
2  1  0  1

 

into this:

 

ID A B C
1  1  1  0
2  1  1  1

 

In other words, if any non-ID variable is 1 in any record within a BY group, it should become 1 in a single output record for this BY group; and if it is 0 in all records, it should remain 0. In the thread, it was solved using a variety of methods (heck, this is SAS). But what caught my attention is this is a rare case where one of the bitwise SAS functions can be put to good use. Consider:

data have ;                              
  input id (A B C) (:$1.) ;              
  cards ;                                
1  1 0 0                                 
1  0 1 0                                 
1  1 1 0                                 
2  0 1 1                                 
2  1 0 1                                 
;                                        
run ;                                    
                                         
data _null_ ;                            
  _b0 = 0 ;                              
  do until (last.id) ;                   
    set have ;                           
    by id ;                              
    _b = input (cats (A,B,C), binary3.) ;
    _b0 = BOR (_b0, _b) ;                
  end ;                                  
  put _b0=binary3. ;                     
run ;                                    

The log prints:

_b0=110
_b0=111

This is because when the BOR ("bitwise OR") function compares numeric values, it sets each bit in its response to 1 if the respective bit in either argument is 1. Thus, after each BY group (i.e. after the DoW loop stops iterating) the consecutive bytes of the _B0 binary image can be used to populate the required variables with the needed values. For example, in this case (i.e. when A, B, C are $1 variables with consecutive memory addresses):   

data want (drop = _:) ;                              
  do _b0 = 0 by 0 until (last.id) ;                  
    set have ;                                       
    by id ;                                          
    _b0 = BOR (_b0, input (cats (A,B,C), binary3.)) ;
  end ;                                              
  call pokelong (put (_b0, binary3.), addrlong (A)) ;
run ;                                                

FWIW

Paul D.

 

 

Super User
Posts: 2,049

Re: Bitwise function use case

A request:  

if the one with APP can accommodate little more explanation or comments, would help intermediate level users too. Plz

 

Of course refering to your paper and the code

www2.sas.com/proceedings/sugi29/264-29.pdf
will eventually make the wider audience understand but that's a steep learning curve for anybody who isn't anywhere near PD yet.  
 

 

 

Frequent Contributor
Posts: 112

Re: Bitwise function use case

Posted in reply to novinosrin

@novinosrin:

 

There's nothing complex about it. The compiler sees A, B, C consecutively, and thus stores them physically at addresses following each other. For example, if I run:

data _null_ ;                           
  set have (obs=1) ;                    
  addr_A = input (addrlong (A), pib4.) ;
  addr_B = input (addrlong (B), pib4.) ;
  addr_C = input (addrlong (C), pib4.) ;
  put (addr:) (=/) ;                     
run ;                                   

I see in the log:

addr_A=278362552
addr_B=278362553
addr_C=278362554

Your figures will be different (in fact, they change from execution to execution), but as you can see, A, B, C are placed into memory one after another according to their $1 length. Thus, when you use CALL POKELONG to stick a 3-byte string _B0 directly into memory beginning with the address of A, it replaces the values of A, B, C with whatever bytes the string has; and the values of A, B, C are changed accordingly - without actually assigning them explicitly to A, B, C.

 

If you wanted to do the same with a numeric variable, you'd have to replace its entire RB8 image in memory with the RB8 image of the replacement value. For example:

data _null_ ; 
retain A B C 5 ;
put "Before: " (A B C) (=) ;
replace = put (1, rb8.) || put(2, rb8.) || put(3, rb8.) ;
call pokelong (replace, addrlong (A)) ;
put "After: " (A B C) (=) ;
run ;

And the log will report:     

Before: A=5 B=5 C=5
After:  A=1 B=2 C=3

For much more and when and why this kind of utility comes in especially handy, just peruse the paper you've cited.

 

HTH

Paul D.

 

p.s. The reason I use the LONG APP functions here is because I run it on a 64-bit machine where non-LONG APP functions don't work. The reason SAS introduced the "long" functions around 2000 is because they had realized that on 64-bit machines address numbers could exceed the 16-digit precision of the SAS numeric RB8. variable. So, a decision was made by SAS  to store addresses as character values, which is the same as to store them as 256-radix numbers. (A bit of history: At PNWSUG 1999, Rick Langston from SAS cornered me and asked: "Since you're the only one who uses these functions, how do you think this 64-bit situation should be handled?" I said, heck, extend the SAS numeric variable to 128-bits. You'll both solve the problem and extend its integer precision from 2**53 to 2**117  - that's 35 digits - if you keep the length of the exponent the same. Rick said it wasn't possible. And, lo and behold, the SAS numeric variable is still RB8, i.e. 64 bits.)    

 

ADDRLONG returns a 20-byte string, so it can potentially accurately store an address with a number if up to 256**20-1. It's a kind of overkill since I've never seen more than 4 non-blank bytes in a ADDRLONG response on any 64-bit platform I've tried - meaning that the actual addresses don't exceed 256**4-1=4,294,967,295, and so currently there wouldn't be any concern with the RB8. precision. Yet SAS still have made the non-LONG APP functions invalid on the 64-bit platforms ... and that's the way it is. Since ADDRLONG returns a character value, in order to "see" the actual numeric address, I have to convert the 4 non-blank bytes into a number using the PIB4. informat, which essentially converts a 256-radix number (i.e. a character string) into a 10-radix integer.    

 

On 32-bit platforms, it's much easier because the non-LONG APP functions return a numeric value. Hence, if you have address A and address B and want to find how distant they are from each other, you just subtract addr(A) from addr(B). On the 64-bit platforms, you have to do the PIB4. conversions to accomplish that. Or, if you want to do something at the address 100 bytes higher than address A, on a 32-bit platform you simply add 99 to address A. On a 64-bit, you'd have to (a) convert address A to a number using the PIB4. informat, (b) add 99 to that, and (c) convert the result to a character variable using the PIB4. format. I guess the inconvenience was realized at SAS since lately, they have, thankfully, added the PTRLONGADD function, so one no longer has to go through this PITA. Would be nice of them if they also added something like ADDRLONGDIFF (address1, address2), but I'm not holding my breath.  

 

Ask a Question
Discussion stats
  • 2 replies
  • 63 views
  • 4 likes
  • 2 in conversation