Solved: Re: Code to get combinations

sg_kr · Posted 07-26-2018 01:21 PM

Hi ,

I have dataset agent_details with produt code(pdt_cde) ,agent_name and points for each agent.

pdt_cde	agent_name	points
201	a	10
202	b	20
203	c	30
201	d	40
204	e	50
203	f	60
202	g	70
201	h	80
202	i	90
201	j	100

i want the sharing combinations of agent by product and who ever is the first agent in the sharing his points should be populated like below.

Name1	nam2	name3	name4	name5	points
a	d	h	j		10
b	g	i			20
c	f				30
d	h	j	a		40
e					50
f	c				60
g	i	b			70
h	j	d	a		80
i	b	g			90
j	a	d	h		100

for this i tried to do it in do loops by using first.product.

*************************************

data sharing

set agent_details;

length %do y=1 %to 7; Agtname&y. $70. %end;;

retain %do y=1 %to 7; Agtname&y. %end;;

if first.pdt_cde_no then do;

%do y=1 %to 7;

Agtname&y.='';

%end;

end;

%do y=1 %to 3;

if Count=&y. then do;

Agtname&y.=Agtname;

end;

%end;

But im not getting likely, so could anyone please help me on this.

thanks in advance!!!

hashman · Posted 07-26-2018 04:48 PM

Your sample output isn't quite consistent with respect to the order in which the names are collected. To wit, for the record with points=80, you first collect the items with pdt_cde=201 from that point down (h, j) and then - from that point up (d, a). Yet for the record with points=90, you have nothing to find downwards for pdt_cde=202 but then proceed to collect the items not from that point up (g, b) but from the top of the table down (b, g). So, there're 3 possibilities I see:

1. You don't care in which order the agent names are collected for the same pdt_cde.

2. You want to collect from the current record down, then from the top of the table down.

3. You want to collect from the current record down, then from this record up.

However, whatever the case, first you need to find out how many agent name variables you need. It's easy:

data have ;                                                    
  input pdt_cde agent_name $ points ;                          
  cards ;                                                      
201  a  10                                                     
202  b  20                                                     
203  c  30                                                     
201  d  40                                                     
204  e  50                                                     
203  f  60                                                     
202  g  70                                                     
201  h  80                                                     
202  i  90                                                     
201  j  100                                                    
;                                                              
run ;                                                          
                                                               
proc sql noprint ;                                             
  select cats (max(q)) into: nv                                
  from (select pdt_cde, count (pdt_cde) as q from have group 1)
  ;                                                            
quit ;

Now let's look at the choice #1 above (you don't care about the collection order) since it's the easiest to code:

data want (keep = points name:) ;               
  if _n_ = 1 then do ;                          
    dcl hash h (dataset:"have", multidata:"y") ;
    h.definekey ("pdt_cde") ;                   
    h.definedata ("pdt_cde", "agent_name") ;    
    h.definedone () ;                           
  end ;                                         
  array name (j) $ name1 - name&nv ;            
  set have ;                                    
  call missing (of name[*]) ;                   
  do j = 1 by 1 while (h.do_over() = 0) ;       
    name = agent_name ;                         
  end ;                                         
run ;

The case #2 is more involved since now we have to traverse each same-key hash item group twice, though thankfully in the same direction. This appears to generate the output closest to what you've shown.

data want (keep = points name:) ;                        
  dcl hash h (dataset:"have", multidata:"y") ;           
  h.definekey ("pdt_cde") ;                              
  h.definedata ("pdt_cde", "agent_name") ;               
  h.definedone () ;                                      
  h.output (dataset:"hash") ;                            
  array name (j) $ name1 - name&nv ;                     
  do until (z) ;                                         
    set have (rename = (agent_name = an)) end = z ;      
    call missing (of name[*]) ;                          
    if h.find() = 0 then do until (h.find_next() ne 0) ; 
      if an = agent_name then leave ;                    
    end ;                                                
    j = 1 ;                                              
    name = agent_name ;                                  
    h.has_next (result:R) ;                              
    do while (R ne 0) ;                                  
      h.find_next() ;                                    
      j + 1 ;                                            
      name = agent_name ;                                
      h.has_next (result:R) ;                            
    end ;                                                
    if h.find() = 0 then do until (h.find_next() ne 0) ; 
      if an = agent_name then leave ;                    
      j + 1 ;                                            
      name = agent_name ;                                
    end ;                                                
    output ;                                             
  end ;                                                  
  stop ;                                                 
  set have (keep = agent_name) ;                         
run ;

Finally, the case #3 involves still more hash table traversing gymnastics since we now need to traverse each same-key hash item group both down and up. (But I do admit that it lends itself to a bit of creative hash object programming.)

data want (keep = points name:) ;                       
  dcl hash h (dataset:"have", multidata:"y") ;          
  h.definekey ("pdt_cde") ;                             
  h.definedata ("pdt_cde", "agent_name") ;              
  h.definedone () ;                                     
  h.output (dataset:"hash") ;                           
  array name (j) $ name1 - name&nv ;                    
  do until (z) ;                                        
    set have (rename = (agent_name = an)) end = z ;     
    call missing (of name[*]) ;                         
    if h.find() = 0 then do until (h.find_next() ne 0) ;
      if an = agent_name then leave ;                   
    end ;                                               
    j = 1 ;                                             
    name = agent_name ;                                 
    h.has_next (result:R) ;                             
    do while (R ne 0) ;                                 
      h.find_next() ;                                   
      j + 1 ;                                           
      name = agent_name ;                               
      h.has_next (result:R) ;                           
    end ;                                               
    if h.find() = 0 then do until (h.find_next() ne 0) ;
      if an = agent_name then leave ;                   
    end ;                                               
    h.has_prev (result:R) ;                             
    do while (R ne 0) ;                                 
      h.find_prev() ;                                   
      j + 1 ;                                           
      name = agent_name ;                               
      h.has_prev (result:R) ;                           
    end ;                                               
    output ;                                            
  end ;                                                 
  stop ;                                                
  set have (keep = agent_name) ;                        
run ;

HTH

Paul D.

p.s. Quite a few interesting variations and examples on this kind of theme are found in the SAS Press just published (penned together with Don Henderson):

https://support.sas.com/en/books/authors/paul-dorfman.html

View solution in original post

novinosrin · Posted 07-26-2018 02:02 PM

Not the most efficient, but if urgent use until you get a better one form SAS champs:

data have;
input pdt_cde	agent_name $	points;
cards;
201	a	10
202	b	20
203	c	30
201	d	40
204	e	50
203	f	60
202	g	70
201	h	80
202	i	90
201	j	100
;

proc sort data=have out=_have;
by pdt_cde ;
run;
proc transpose data=_have out=w prefix=name;
by pdt_cde;
var agent_name;
run;

data want;
merge _have w;
by pdt_cde;
array t(*) name:;
k=whichc(agent_name, of t(*));
temp=t(K);
call missing(t(k));
output;
t(K)=temp;
keep name: agent_name points;
run;

novinosrin · Posted 07-26-2018 03:27 PM

/*Or*/

data have;
input pdt_cde	agent_name $	points;
cards;
201	a	10
202	b	20
203	c	30
201	d	40
204	e	50
203	f	60
202	g	70
201	h	80
202	i	90
201	j	100
;
proc sql;
create table w as 
select a.pdt_cde,b.points,a.agent_name,b.agent_name as _agent_name
from have a left join have b
on a.pdt_cde=b.pdt_cde
order by b.agent_name,a.agent_name;
quit;

proc transpose data=w out=want(drop=pdt_cde _:) prefix=name;
by pdt_cde points _agent_name  notsorted;
var agent_name;
run;

novinosrin · Posted 07-26-2018 04:16 PM

data have;
input pdt_cde	agent_name $	points;
cards;
201	a	10
202	b	20
203	c	30
201	d	40
204	e	50
203	f	60
202	g	70
201	h	80
202	i	90
201	j	100
;
data w;
if _n_=1 then do;
 dcl hash h(dataset:'have', multidata: 'y');
 h.definekey('pdt_cde');
 h.definedata('agent_name');
 h.definedone();
 end;
set have;
do while(h.do_over(key:pdt_cde) eq 0);
output;
end;
run;
proc transpose data=w out=want(drop=_name_) prefix=name;
by points notsorted;
var agent_name;
run;

hashman · Posted 07-26-2018 04:48 PM

Your sample output isn't quite consistent with respect to the order in which the names are collected. To wit, for the record with points=80, you first collect the items with pdt_cde=201 from that point down (h, j) and then - from that point up (d, a). Yet for the record with points=90, you have nothing to find downwards for pdt_cde=202 but then proceed to collect the items not from that point up (g, b) but from the top of the table down (b, g). So, there're 3 possibilities I see:

1. You don't care in which order the agent names are collected for the same pdt_cde.

2. You want to collect from the current record down, then from the top of the table down.

3. You want to collect from the current record down, then from this record up.

However, whatever the case, first you need to find out how many agent name variables you need. It's easy:

data have ;                                                    
  input pdt_cde agent_name $ points ;                          
  cards ;                                                      
201  a  10                                                     
202  b  20                                                     
203  c  30                                                     
201  d  40                                                     
204  e  50                                                     
203  f  60                                                     
202  g  70                                                     
201  h  80                                                     
202  i  90                                                     
201  j  100                                                    
;                                                              
run ;                                                          
                                                               
proc sql noprint ;                                             
  select cats (max(q)) into: nv                                
  from (select pdt_cde, count (pdt_cde) as q from have group 1)
  ;                                                            
quit ;

Now let's look at the choice #1 above (you don't care about the collection order) since it's the easiest to code:

data want (keep = points name:) ;               
  if _n_ = 1 then do ;                          
    dcl hash h (dataset:"have", multidata:"y") ;
    h.definekey ("pdt_cde") ;                   
    h.definedata ("pdt_cde", "agent_name") ;    
    h.definedone () ;                           
  end ;                                         
  array name (j) $ name1 - name&nv ;            
  set have ;                                    
  call missing (of name[*]) ;                   
  do j = 1 by 1 while (h.do_over() = 0) ;       
    name = agent_name ;                         
  end ;                                         
run ;

The case #2 is more involved since now we have to traverse each same-key hash item group twice, though thankfully in the same direction. This appears to generate the output closest to what you've shown.

data want (keep = points name:) ;                        
  dcl hash h (dataset:"have", multidata:"y") ;           
  h.definekey ("pdt_cde") ;                              
  h.definedata ("pdt_cde", "agent_name") ;               
  h.definedone () ;                                      
  h.output (dataset:"hash") ;                            
  array name (j) $ name1 - name&nv ;                     
  do until (z) ;                                         
    set have (rename = (agent_name = an)) end = z ;      
    call missing (of name[*]) ;                          
    if h.find() = 0 then do until (h.find_next() ne 0) ; 
      if an = agent_name then leave ;                    
    end ;                                                
    j = 1 ;                                              
    name = agent_name ;                                  
    h.has_next (result:R) ;                              
    do while (R ne 0) ;                                  
      h.find_next() ;                                    
      j + 1 ;                                            
      name = agent_name ;                                
      h.has_next (result:R) ;                            
    end ;                                                
    if h.find() = 0 then do until (h.find_next() ne 0) ; 
      if an = agent_name then leave ;                    
      j + 1 ;                                            
      name = agent_name ;                                
    end ;                                                
    output ;                                             
  end ;                                                  
  stop ;                                                 
  set have (keep = agent_name) ;                         
run ;

Finally, the case #3 involves still more hash table traversing gymnastics since we now need to traverse each same-key hash item group both down and up. (But I do admit that it lends itself to a bit of creative hash object programming.)

data want (keep = points name:) ;                       
  dcl hash h (dataset:"have", multidata:"y") ;          
  h.definekey ("pdt_cde") ;                             
  h.definedata ("pdt_cde", "agent_name") ;              
  h.definedone () ;                                     
  h.output (dataset:"hash") ;                           
  array name (j) $ name1 - name&nv ;                    
  do until (z) ;                                        
    set have (rename = (agent_name = an)) end = z ;     
    call missing (of name[*]) ;                         
    if h.find() = 0 then do until (h.find_next() ne 0) ;
      if an = agent_name then leave ;                   
    end ;                                               
    j = 1 ;                                             
    name = agent_name ;                                 
    h.has_next (result:R) ;                             
    do while (R ne 0) ;                                 
      h.find_next() ;                                   
      j + 1 ;                                           
      name = agent_name ;                               
      h.has_next (result:R) ;                           
    end ;                                               
    if h.find() = 0 then do until (h.find_next() ne 0) ;
      if an = agent_name then leave ;                   
    end ;                                               
    h.has_prev (result:R) ;                             
    do while (R ne 0) ;                                 
      h.find_prev() ;                                   
      j + 1 ;                                           
      name = agent_name ;                               
      h.has_prev (result:R) ;                           
    end ;                                               
    output ;                                            
  end ;                                                 
  stop ;                                                
  set have (keep = agent_name) ;                        
run ;

HTH

Paul D.

p.s. Quite a few interesting variations and examples on this kind of theme are found in the SAS Press just published (penned together with Don Henderson):

https://support.sas.com/en/books/authors/paul-dorfman.html

novinosrin · Posted 07-26-2018 05:48 PM

@hashman My oh my Sir, you beat the documentation. If you have a min,

  array name (j) $ name1 - name&nv ;

How does array subscript not have a number and variable operand j?

Where have i missed in the documentation? 😞

subscript}

describes the number and arrangement of elements in the array by using an asterisk, a number, or a range of numbers. Subscript has one of these forms:

{dimension-size(s)}

specifies the number of elements in each dimension of the array. Dimension-size is a numeric representation of either the number of elements in a one-dimensional array or the number of elements in each dimension of a multidimensional array.

Tip:

You can enclose the subscript in braces ({}), brackets ( [ ] ) or parentheses (( )).

Example:

An array with one dimension can be defined as

array simple{3} red green yellow;

This ARRAY statement defines an array that is named SIMPLE that groups together three variables that are named RED, GREEN, and YELLOW.

Example:

An array with more than one dimension is known as a multidimensional array. You can have any number of dimensions in a multidimensional array. For example, a two-dimensional array provides row and column arrangement of array elements. This statement defines a two-dimensional array with five rows and three columns:

array x{5,3} score1-score15;

SAS places variables into a two-dimensional array by filling all rows in order, beginning at the upper-left corner of the array (known as row-major order).

hashman · Posted 07-26-2018 07:14 PM

Hi, @novinosrin! This is an implicitly subscripted array. Though SAS fully supports such arrays, they have apparently decided to no longer include them in the docs, which supposedly serves to discourage their use. But I love them and use when appropriate; what is "appropriate", I determine myself ;). Just plug "implicit SAS array" into Google, and you'll find all you need. Here's a minute demo, just a teaser:

data _null_ ;                              
  array nn A B C D E (1 2 3 4 5) ;         
                                           
  do over nn ;                             
    put _i_= nn= ;                         
  end ;                                    
  _i_ = 3 ;                                
  nn = 33 ;                                
  put / nn= / ;                            
                                           
  array cc (j) $1 C1-C4 ("A" "B" "C" "D") ;
                                           
  do over cc ;                             
    put j= cc= ;                           
    if j in (2, 4) then cc = "Z" ;         
  end ;                                    
  put / (cc[*]) (=) ;                      
run ;

What you'll see in the log is:

_I_=1 A=1
_I_=2 B=2
_I_=3 C=3
_I_=4 D=4
_I_=5 E=5

C=33

j=1 C1=A
j=2 C2=B
j=3 C3=C
j=4 C4=D

C1=A C2=Z C3=C C4=Z

As you see, if you don't specify the name of your own index variable (like the j variable above), the automatic variable _I_ is used. It's also auto-dropped (but if you specify your own index variable, it's not). You cannot reference these arrays as NN[_i_] or CC[j] - that's why they are "implicit". Whenever you reference the array name like NN or CC, you're referencing the array element with the current PDV value of its index.

I like the DO OVER construct for its brevity and self-explanatory nature - it simply scans the array from the lower to higher bound. Interestingly, while having dropped the implicit arrays from the docs, SAS introduced the DO_OVER hash object method in 9.4, apparently named after you guessed what.

Another rather unique feature of implicit arrays (hat explicit ones simply don't have) is their ability to incorporate other implicit arrays by name as their elements and loop through them one at a time.

Have fun.

Paul D.

novinosrin · Posted 07-26-2018 07:33 PM

Thank you Sir @hashman. That level diligence can only come from you. More than anything, I can't appreciate enough your generosity to lend your time to explain at length with examples. I mean it and I value it and hopefully I should honor the commitment that I have given to myself to bother you only when there isn't otherwise.

Likewise, as I begin Chapter 3: Item-Level Operations: Direct Access having completed chapters 1 and 2 in the book, I have already noticed and paid serious attention to many of the nuances and intricacies that you brilliantly explained that are not quite noticeable in the docs, and I can only imagine lot of surprises to come.

Many thanks once again. Have a great evening

hashman · Posted 07-26-2018 09:35 PM

@novinosrin, you're most welcome, Sir. Please don't hesitate to ask anything if you feel like it. That's what we are here for, aren't we?

Thanks 1E6 for reading our hash opus. Explaining and exemplifying things and use cases not quite obvious from the docs or even never mentioned there was one of our main goals from the outset. If it helps someone, it's the most satisfaction we can add to the geeky pleasure of writing the thing.

Best

Paul D.

sg_kr · Posted 07-27-2018 06:28 AM

thanks for your response and solution sir.

if you look at the below code, which i have posted its working fine for me but i am not get the other combinations.

if could you do the modifications in the same code, that will be really helpful for me.

thanks in advance

hashman · Posted 07-27-2018 07:54 AM

@sg_kr, I would't mind but what code of yours are you referring to? I don't see code "below".

sg_kr · Posted 07-27-2018 08:24 AM

data sharing

set agent_details;

length %do y=1 %to 7; Agtname&y. $70. %end;;

retain %do y=1 %to 7; Agtname&y. %end;;

if first.pdt_cde_no then do;

%do y=1 %to 7;

Agtname&y.='';

%end;

end;

%do y=1 %to 3;

if Count=&y. then do;

Agtname&y.=Agtname;

end;

%end;

hashman · Posted 07-27-2018 12:43 PM

I'm sorry to say but there are too many things wrong with this text to become valid SAS code. You have macro statement but don't have a macro they're part of. You have a first.x variable but have no BY statement. And even if you had, your agent_details data set is unsorted, so BY processing wouldn't work.

Perhaps you meant to sort it and then operate on the sorted data set. I can see the benefit of working against the sorted file; but I can't see how this intent (if I discerned it correctly from your code) can get you where you want to get.

Just speculating, it looks as though you may have intended to traverse each BY group and then, for each current record from it, re-traverse it collecting all the names except for the current record - all solely by reading the file and without table lookup. It's possible; however, then if you care about the original order in which the points are listed, you'll have to re-sort the output back by points. Also, it requires reading the data thrice in the DATA step alone: First, to establish the BY group endpoints (one pass) and then to cross the records (two more). For example:

proc sort data = have out = haves ;            
  by pdt_cde points ;                          
run ;                                          
                                               
data want (keep = points name:) ;              
  do i = 1 by 1 until (last.pdt_cde) ;         
    set haves (keep = pdt_cde) ;               
    by pdt_cde ;                               
    q ++ 1 ;                                   
    if i = 1 then qb = q ;                     
  end ;                                        
  qe = q ;                                     
  array name $ name1 - name4 ;                 
  do i = qb to qe ;                            
    set haves (keep = points agent_name) ;     
    _i_ = 1 ;                                  
    name = agent_name ;                        
    do j = qb to qe ;                          
      if i = j then continue ;                 
      set haves (keep = agent_name) point = j ;
      _i_ + 1 ;                                
      name = agent_name ;                      
    end ;                                      
    output ;                                   
  end ;                                        
run ;                                          
                                               
proc sort data = want ;                        
  by points ;                                  
run ;

In which case, you'd get the following output:

name1 name2 name3 name4 points

a	d	h	j	10
b	g	i		20
c	f			30
d	a	h	j	40
e				50
f	c			60
g	b	i		70
h	a	d	j	80
i	b	g		90
j	a	d	h	100

Note that sorting, multi-pass reads, and re-sorting are okay when the data volume lends itself to this kind of processing. In some lines of business (such as clinical trials with their lilliputian data) it's immaterial; in others (like telops or healthcare) it can be downright prohibitive. (Most of my pro life I've dealt with data that totally negates failing to pay close attention to the machine aspect of coding efficiency. That's why in the first "solution" I offered you I succumbed to the habit of completing processing in a single-pass through the original unsorted file.)

HTH

Paul D.

novinosrin · Posted 07-27-2018 01:46 PM

@hashman When Thanks* infinite has already been mentioned in here and couple other posts, I am not gonna repeat that again. So sorry. 🙂

boss, At your convenient time and when and if you can , regarding efficiency ?

As i progress through the book, you seem to prefer do until end=lr pass and then stop than a conventional one. It does make sense to me for further processing et al. And if you wanna reread with yet another do until or new datastep, how much of significance you notice in I/0, Memory metrics?

PS

On a hilarious note , to make you chuckle, i was reading through the book on the chicago red line on my way home, and somebody noticed the line "HOW of the Hash obj" and grinned commenting very interesting

hashman · Posted 07-27-2018 05:12 PM

@novinosrin,

Thanks for the thanks and again, for reading the opus. Performance-wise, the explicit DO mattered in the 1980s. Today, it makes no difference. The style is largely dictated by the programming structure and convenience, such as ridding of RETAINs or the need to surround the CREATE operation block with IF _N_=1, and so on. Of course, it you want to process a file, then do something (like compute) based on the results and then read it again, it's much more handy to do that something between two explicit full-file DoW loops rather than engage in the IN= gymnastics within the implied loop. So basically, this style is used when it lends more directly to the stream-of-the-consciousness logic of the program.

Thanks for the amusement ;);

Best

Paul D.

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!