Solved: Re: How to find min variable when there are two or more value exist?

maozsoy · Posted 10-05-2019 05:03 PM

Hello,

I am using SAS EG 7.1.

My purpose is find to minimum value of FEE_A - ... - FEE_D columns and print to column name as 'ASSIGNMENT'. If two or more minimum value is exist this time i want to print column name have maximum value of SCORE_A - ... SCORE_D columns.

My data is:

I try to add 'Assignment' column.

On first, second and third row; there are only one minimum value of Fee_A - ... - Fee_D.

On fourth and fifth row; there are equal minimum value of fee_.. columns (eg. for fourth row; FEE_B = FEE_D = 2) But SCORE_D > SCORE_B (100>90) so value of ASSIGNMENT column is D.

On sixth row; there are two minimum value (FEE_C = FEE_D = 2) and values of SCORE_C and SCORE_D are equal. (60) For this case it doesn't matter what print 'C' or 'D' value.

For insert my data:

data table_1;                    
input 
ID SCORE_A SCORE_B SCORE_C SCORE_D
FEE_A FEE_B FEE_C FEE_D;
 
datalines;  
111111  70  85  50    60    2  4  3  8
222222  80  65  80    70    9  4  5  1
333333  60  70  70    100   5  3  5  4
444444  60  90  100   100   7  2  3  2
555555  40  70  80    50    4  1  5  1
666666  75  80  60    60    9  3  2  2
; 
run;

I try to write code below, but i could not. In addition, actually I have approximately 100 columns both 'score_...' and 'fee_...' so I have to use macro and array.

data work.table_2;
set work.table_1;

array all_fee [*] &fee.;  /*i create this macro before*/

counter = 0;
do i=1 to dim(all_fee);
if all_fee{i} = min(of all_fee(*)) then counter = counter + 1; else counter = counter;
end;

length multi_min $ 500;
length ASSIGNMENT $ 500;

do i=1 to dim(all_fee);

 if counter = 0 then assignment = '' ; end;
 else if counter = 1 then do;
        if all_fee{i} = min(of all_fee(*)) then do;
        assignment = substr((vname(all_fee{i})), 5, (length((vname(all_fee{i})))-4));,
        end;
       end;
 else if counter > 1 then do;
        if all_fee{i} = min(of all_fee(*)) then do;
        multi_min = catx(' ', multi_min, (substr((vname(all_fee{i})), 5, (length((vname(all_fee{i})))-4))) );
        end;
        end;
 else assignment = .;

end;

drop i;
run;

How should I write this code or Is there any shorter way? Can you help me?

Best Regards,

Thank you

s_lassen · Posted 10-07-2019 03:55 AM

If you are working with shuffled array (name suffixes not in the same order in the two sets), a bit of macro stuff may be necessary:

%macro listvars(prefix,suffixes);
  %local i w;
  %do i=1 %to 999999;
    %let w=%scan(&suffixes,&i);
	%if %length(&w)=0 %then %return;
	%do; &prefix.&w%end;
	%end;
%mend;

data want (drop = _:) ;                                             
  set have ;                                                        
  array s %listvars(SCORE_,A B C D);                                                 
  array f %listvars(FEE_,A B C D);                                                   
  do over s ;                                                       
    if f ne min (of f[*]) or s <= _ms then continue ;               
    _ms = s ;                                                       
    assignment = put (scan (vname (f), 2, "_"), $upcase32.) ;       
  end ;                                                             
run ;

Apart from the new array definitions, the solution is the one presented by @hashman. The macro just generates a list like " FEE_A FEE_B FEE_C FEE_D". You can also skip the macro and write the full lists in your program, if you prefer.

Regarding your other questions:

DO OVER essentially loops through the array, using a temporary variable _I_ as the array iterator. Essentially it is equal to DO _I_=LBOUND(<array>) TO HBOUND(<array>). This shorthand is not documented anymore, but I suspect that SAS will keep it anyway, as removing may break a lot of old programs. One word of warning: Do not nest DO OVER loops! The inner and the outer loop will use the same iterator, _I_, which will break the program.
CONTINUE means "jump to the next iteration of the loop", so the statements inside the loop following the CONTINUE statement will not be executed.

View solution in original post

hashman · Posted 10-05-2019 09:19 PM

@maozsoy:

In principle, you seem to head in the right direction by using arrays. But there's no need for macros of any kind, sort, or shape: You need to only diligently execute a straightforward piece of array logic. Also, using A, B, etc. as values for ASSIGNMENT is precarious: Since, as you say, you have 100+ variables like SCORE* and FEE* in your real file, you don't have enough letters to mark all of them apart. Perhaps you have combinations of letters and/or digits after the underscore in the variables' names, but you aren't telling us. At any rate, below, the part of the variable name after the underscore is used, whatever it is, which precludes any such collisions.

data have ;                                                         
  input id score_a score_b score_c score_d fee_a fee_b fee_c fee_d ;
  cards ;                                                           
111111  70  85  50    60    2  4  3  8                              
222222  80  65  80    70    9  4  5  1                              
333333  60  70  70    100   5  3  5  4                              
444444  60  90  100   100   7  2  3  2                              
555555  40  70  80    50    4  1  5  1                              
666666  75  80  60    60    9  3  2  2                              
;                                                                   
run ;                                                               
                                                                    
data want (drop = _:) ;                                             
  set have ;                                                        
  array s score_: ;                                                 
  array f fee_: ;                                                   
  do over s ;                                                       
    if f ne min (of f[*]) or s <= _ms then continue ;               
    _ms = s ;                                                       
    assignment = put (scan (vname (f), 2, "_"), $upcase32.) ;       
  end ;                                                             
run ;

Note that the "max" logic vis-a-vis the auxiliary _MS variable is partly made possible by the fact that the unretained _MS is auto-set to a missing value at the top of the DATA step implied loop.

Kind regards

Paul D.

maozsoy · Posted 10-07-2019 03:07 AM

@hashman I am very pleased for your contributions.

Unfortunately i couldn't understand what exactly difference "do over" and "do i=1 to dim(s)". And I guess "... then continue;" reference the next two lines, right it?

In addition; this structure don't work when "score_..." columns and "fee_..." columns are shuffled. For example if my data is like below, it doesn't result that i want. (I changed the order of 'score_d' and 'score_b')

data have1 ;                                                         
  input id score_a score_d score_c score_b fee_a fee_b fee_c fee_d ;
  cards ;                                                           
111111  70  60  50   85    2  4  3  8                              
222222  80  70  80   65    9  4  5  1                              
333333  60  100 70   70    5  3  5  4                              
444444  60  100 100  90    7  2  3  2                              
555555  40  50  80   70    4  1  5  1                              
666666  75  60  60   80    9  3  2  2                              
;                                                                   
run ;

hashman · Posted 10-07-2019 03:40 AM

@maozsoy:

When the array is defined as:

array s score_: ;

it automatically creates an implicitly subscripted array S with the array index as an automatic variable _I_ (which is automatically dropped from the output). Then:

do over s ;

...

reference to s ...

...

end ;

means the same as:

do _i_ = 1 to dim (s) ;

...

reference to s[_i_] ...

...

end ;

So. basically, there's no difference between the @Ksharp's code and mine in this respect, except that he's using an explicitly subsripted array.

CONTINUE within a DO loop means: If the condition preceding it is true, pass program control to the bottom of the loop (i.e. the location right before END), ignoring all instructions between it and the bottom of the loop, and go to the next iteration of the loop.

If you shuffle either SCORE* or FEE* variables, so that their suffixes don't align, you redefine the task and need to specify which variable - SCORE* or FEE* - should supply its suffix to ASSIGNMENT after the minimal FEE is found for the maximum SCORE that resolves the ties among more than one minimal FEE.

I don't understand why you keep referencing some kind of macro. Macros have nothing to do with this task or code. Remember that macros are nothing but one of the means to assemble executable SAS code - and you have no code to assemble here. All you need is a straightforward (and correct according to the specs) piece of DATA step logic.

Kind regards

Paul D.

maozsoy · Posted 10-07-2019 04:11 AM

Thanks for explanation of 'array' and 'continue' templates, ok.

I am thinking i need to use macro. Because the order of columns may vary in the table. But my code should not be affected this case.

Actually i mean;

a, b, c... are my products that have names differently.

a ==> fee_a, score_a

b ==> fee_b, score_b

c ==> fee_c, score_c

Also they can called as;

pencil ==> fee_pencil, score_pencil (different features of the same product-pencil-)

book ==> fee_book, score_book (different features of the same product-book-)

rubber ==> fee_rubber, score_rubber (different features of the same product-rubber-)

So although their order is different (eg. score_d - score_b - fee_b - fee_d), my code structure should examine score_d for fee_d and should examine score_b for fee_b same way.

I hope i could explain 🙂

s_lassen · Posted 10-07-2019 03:55 AM

If you are working with shuffled array (name suffixes not in the same order in the two sets), a bit of macro stuff may be necessary:

%macro listvars(prefix,suffixes);
  %local i w;
  %do i=1 %to 999999;
    %let w=%scan(&suffixes,&i);
	%if %length(&w)=0 %then %return;
	%do; &prefix.&w%end;
	%end;
%mend;

data want (drop = _:) ;                                             
  set have ;                                                        
  array s %listvars(SCORE_,A B C D);                                                 
  array f %listvars(FEE_,A B C D);                                                   
  do over s ;                                                       
    if f ne min (of f[*]) or s <= _ms then continue ;               
    _ms = s ;                                                       
    assignment = put (scan (vname (f), 2, "_"), $upcase32.) ;       
  end ;                                                             
run ;

Apart from the new array definitions, the solution is the one presented by @hashman. The macro just generates a list like " FEE_A FEE_B FEE_C FEE_D". You can also skip the macro and write the full lists in your program, if you prefer.

Regarding your other questions:

DO OVER essentially loops through the array, using a temporary variable _I_ as the array iterator. Essentially it is equal to DO _I_=LBOUND(<array>) TO HBOUND(<array>). This shorthand is not documented anymore, but I suspect that SAS will keep it anyway, as removing may break a lot of old programs. One word of warning: Do not nest DO OVER loops! The inner and the outer loop will use the same iterator, _I_, which will break the program.
CONTINUE means "jump to the next iteration of the loop", so the statements inside the loop following the CONTINUE statement will not be executed.

hashman · Posted 10-07-2019 04:43 AM

@s_lassen:

You can nest DO loops over implicit arrays all you want if you define the arrays with different indices, e.g.:

data _null_ ;                                                                                                                                                                                                                                                   
  array ii (i) i1-i3 (1 2 3) ;                                                                                                                                                                                                                                  
  array jj (j) j1-j2 (4 5) ;                                                                                                                                                                                                                                    
  do over ii ;                                                                                                                                                                                                                                                  
    do over jj ;                                                                                                                                                                                                                                                
      put ii= jj= ;                                                                                                                                                                                                                                             
    end ;                                                                                                                                                                                                                                                       
  end ;                                                                                                                                                                                                                                                         
run ;

Result:

i1=1 j1=4
i1=1 j2=5
i2=2 j1=4
i2=2 j2=5
i3=3 j1=4
i3=3 j2=5

Moreover, you can create an implicit array that contains other implicit arrays as its elements - the ability woefully absent from explicitly subscripted arrays. SAS User's Guide: Basics, Version 5 Edition, pages 48-49 (45-49 about the implicit arrays in general). The functionality, despite being as of 1985, still works like a champ.

Kind regards

Paul D.

PeterClemmensen · Posted 10-08-2019 06:49 AM

@hashman I'm curious here. What does an implicit array, that contains other implicit arrays look like?

The DATA to DATA Step Macro
Blog: SASnrd

hashman · Posted 10-08-2019 08:10 PM

@PeterClemmensen:

Curiosity must be satisfied ;). Don't have my V5 Guide with me, so just off the top of my head:

data _null_ ;                                                                                                                                                                                                                                                   
  array a (i) a1-a4 (11 21 31 41) ;                                                                                                                                                                                                                             
  array b (i) b1-b4 (21 22 23 24) ;                                                                                                                                                                                                                             
  array c (i) c1-c4 (31 32 33 34) ;                                                                                                                                                                                                                             
  array abc a b c ;                                                                                                                                                                                                                                             
  do over abc ;                                                                                                                                                                                                                                                 
    do i = 1 to dim (a) ;                                                                                                                                                                                                                                       
      put abc @ ;                                                                                                                                                                                                                                               
    end ;                                                                                                                                                                                                                                                       
    put ;                                                                                                                                                                                                                                                       
  end ;                                                                                                                                                                                                                                                         
  put ;                                                                                                                                                                                                                                                         
  do i = 1 to dim (a) ;                                                                                                                                                                                                                                         
    do over abc ;                                                                                                                                                                                                                                               
      put abc @ ;                                                                                                                                                                                                                                               
    end ;                                                                                                                                                                                                                                                       
    put ;                                                                                                                                                                                                                                                       
  end ;                                                                                                                                                                                                                                                         
run ;

Which prints in the log:

11 21 31 41
21 22 23 24
31 32 33 34

11 21 31
21 22 32
31 23 33
41 24 34

Kind regards

Paul D.

PeterClemmensen · Posted 10-13-2019 07:21 AM

@hashman very cool. Thank you!

The DATA to DATA Step Macro
Blog: SASnrd

hashman · Posted 10-07-2019 03:42 PM

@maozsoy:

As @Ksharp has pointed out, you just need to specify the SCORE and FEE variables to both arrays in the same suffix order. Which one - doesn't matter, as long as it's the same.

You can simply hard code the array variables in the desired order. (Essentially, the macro invoked in your accepted solution does the same thing, except that you don't have to specify the prefixes.) But this is hardly a satisfactory solution since, as you've informed us, the list of the variables can be 100+ long: Are you going to type them all or type their suffixes into the macro call?

Good code is dynamic and shouldn't depend on this kind of vagaries. In this case, the simplest is to pre-create two SCORE and FEE variable lists, both ordered by the variable suffix and feed them into the already working code I offered originally. This way, the solution doesn't depend at all on the order in which the SCORE and FEE variables are listed in HAVE as long as for each FEE suffix there's the same and only one SCORE suffix:

data have ;                                                                                                                                                                                                                                                     
  input id score_a score_d score_c score_b                                                                                                                                                                                                                      
           fee_a   fee_b   fee_c   fee_d                                                                                                                                                                                                                        
  ;                                                                                                                                                                                                                                                             
  cards ;                                                                                                                                                                                                                                                       
111111  70   60  50  85  2  4  3  8                                                                                                                                                                                                                             
222222  80   70  80  65  9  4  5  1                                                                                                                                                                                                                             
333333  60  100  70  70  5  3  5  4                                                                                                                                                                                                                             
444444  60  100 100  90  7  2  3  2                                                                                                                                                                                                                             
555555  40   50  80  70  4  1  5  1                                                                                                                                                                                                                             
666666  75   60  60  80  9  3  2  2                                                                                                                                                                                                                             
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
proc sql noprint ;                                                                                                                                                                                                                                              
  select name into :s separated by " " from dictionary.columns                                                                                                                                                                                                  
  where  libname="WORK" and memname="HAVE" and lower (name) like 'score_%'                                                                                                                                                                                      
  order  lower (name) ;                                                                                                                                                                                                                                         
  select name into :f separated by " " from dictionary.columns                                                                                                                                                                                                  
  where  libname="WORK" and memname="HAVE" and lower (name) like 'fee_%'                                                                                                                                                                                        
  order  lower (name) ;                                                                                                                                                                                                                                         
quit ;                                                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                                
data want (drop = _:) ;                                                                                                                                                                                                                                         
  set have ;                                                                                                                                                                                                                                                    
  array s &s ;                                                                                                                                                                                                                                                  
  array f &f ;                                                                                                                                                                                                                                                  
  do over s ;                                                                                                                                                                                                                                                   
    if f ne min (of f[*]) or s <= _ms then continue ;                                                                                                                                                                                                           
    _ms = s ;                                                                                                                                                                                                                                                   
    assignment = put (scan (vname (f), 2, "_"), $upcase32.) ;                                                                                                                                                                                                   
  end ;                                                                                                                                                                                                                                                         
run ;

It does involve a couple of macro variables but it's not at all the same as using a macro.

However, in reality you don't have to even use any macro variables at all - indeed, don't even need to have the variable name lists sorted. In this case, it simply means that for a given FEE_suffix you need to locate the SCORE_suffix variable with the same suffix, which in turn means that the SCORE suffixes and their corresponding array S indices should be stored in some kind of lookup table. Below, this table takes on the form of the temporary array Z with enough elements to cover the territory (i.e. it is assumed that you have fewer than 10000 SCORE and FEE variables - if not enough, just add another 9 to the array size):

data want (drop = _:) ;                                                                                                                                                                                                                                         
  set have ;                                                                                                                                                                                                                                                    
  array s (_s) score_: ;                                                                                                                                                                                                                                        
  array f (_f) fee_: ;                                                                                                                                                                                                                                          
  array z [9999] $ 26 _temporary_ ;                                                                                                                                                                                                                             
  if _n_ = 1 then do over s ;                                                                                                                                                                                                                                   
    z[_s] = upcase (substr (vname (s), 1 + find (vname (s), "_"))) ;                                                                                                                                                                                            
  end ;                                                                                                                                                                                                                                                         
  length _sfx assignment $ 26 ; *32-length('score') ;                                                                                                                                                                                                           
  do over f ;                                                                                                                                                                                                                                                   
    _sfx = upcase (substr (vname (f), 1 + find (vname (f), "_"))) ;                                                                                                                                                                                             
    _s = whichc (_sfx, of z[*]) ;                                                                                                                                                                                                                               
    if f ne min (of f[*]) or s <= _ms then continue ;                                                                                                                                                                                                           
    _ms = s ;                                                                                                                                                                                                                                                   
    assignment = _sfx ;                                                                                                                                                                                                                                         
  end ;                                                                                                                                                                                                                                                         
run ;

The code is basically the same as the original, except now the index _S is not the same as _F but is dictated by the suffix of FEE in the current iteration of the DO loop.

Some folks poo-poo the concept of allocating a "big enough" lookup table, even if it's well justified (as in the case above). But as of the current state of the SAS software, it doesn't have to be relied upon, and the table can be made dynamic instead:

data want (drop = _:) ;                                                            
  set have ;                                                                       
  array s (_s) score_: ;                                                           
  array f (_f) fee_: ;                                                             
  if _n_ = 1 then do ;                                                             
    dcl hash z () ;                                                                
    z.definekey ("_sfx") ;                                                         
    z.definedata ("_s") ;                                                          
    z.definedone () ;                                                              
    do over s ;                                                                    
      z.add (key:upcase (substr (vname (s), 1 + find (vname (s), "_"))), data:_s) ;
    end ;                                                                          
  end ;                                                                            
  length _sfx assignment $ 26 ; *32-length('score') ;                              
  do over f ;                                                                      
    _sfx = upcase (substr (vname (f), 1 + find (vname (f), "_"))) ;                
    z.find() ;                                                                     
    if f ne min (of f[*]) or s <= _ms then continue ;                              
    _ms = s ;                                                                      
    assignment = _sfx ;                                                            
  end ;                                                                            
run ;

Above the index _S needed to ID the required SCORE variable is extracted from the hash table Z instead of array Z. The result, of course, is the same, except that with a large number of variables SCORE and FEE, the hash table is faster since it searches for _SFX in O(1) time, while WHICHN does the same in O(N) time.

Kind regards

Paul D.

Ksharp · Posted 10-06-2019 07:11 AM

data have ;                                                         
  input id score_a score_b score_c score_d fee_a fee_b fee_c fee_d ;
  cards ;                                                           
111111  70  85  50    60    2  4  3  8                              
222222  80  65  80    70    9  4  5  1                              
333333  60  70  70    100   5  3  5  4                              
444444  60  90  100   100   7  2  3  2                              
555555  40  70  80    50    4  1  5  1                              
666666  75  80  60    60    9  3  2  2                              
;                                                                   
run ;                                                               
                                                                    
data want ;                                             
  set have ;                                                        
  array s{*} score_: ;                                                 
  array f{*}fee_: ; 
min=min(of f{*});
do i=1 to dim(f);
 if f{i}=min and s{i}>max then do;max=s{i};want=scan(vname(f{i}),-1,'_'); end;
end;
drop i min max;
run;

maozsoy · Posted 10-07-2019 03:14 AM

@Ksharp I am very pleased for your contributions., thank you.

But unfortunately this structure don't work when "score_..." columns and "fee_..." columns are shuffled. For example if my data is like below, it doesn't result that i want. (I changed the order of 'score_d' and 'score_b')

data have1 ;                                                         
  input id score_a score_d score_c score_b fee_a fee_b fee_c fee_d ;
  cards ;                                                           
111111  70  60  50   85   2  4  3  8                              
222222  80  70  80   65   9  4  5  1                              
333333  60  100 70   70   5  3  5  4                              
444444  60  100 100  90   7  2  3  2                              
555555  40  50  80   70   4  1  5  1                              
666666  75  60  60   80   9  3  2  2                              
;                                                                   
run ;

I think I need the code like below but I couldn't define macro. How can i do get macro in a loop?

data want ;                                             
  set have1 ;                                                        
  array s{*} score_: ;                                                 
  array f{*}fee_: ; 

counter = 0;
do i=1 to dim(f);
if f{i} = min(of f(*)) then counter = counter + 1; else counter = counter;
end;

min=min(of f{*});

length wants $ 500;

do i=1 to dim(f);
 if f{i}=min then do;
		 if counter <= 1 then want=scan(vname(f{i}),-1,'_');
	else if counter >  1 then do;
            wants=catx(' ', wants, scan(vname(f{i}),-1,'_') ) into:array_wants separated by ' ';
            array w [*] &array_wants.;

            do i=1 to dim(w);
	    if w{i}=max(of w{*}) then want=scan(vname(w{i}),-1,'_');
            end;
          end;
	end;
end;



drop i min max;
run;

Ksharp · Posted 10-07-2019 07:13 AM

As Paul said, specified arrays explicitly :

array s{*} score_a score_b score_c score_d;                                                 
array f{*} fee_a fee_b fee_c fee_d;

hashman · Posted 10-07-2019 01:00 PM

@Ksharp:

You said that.

I should have said that.

Kind regards

Paul D.

Register Today!

SAS Training: Just a Click Away