Hello,
I am using SAS EG 7.1.
My purpose is find to minimum value of FEE_A - ... - FEE_D columns and print to column name as 'ASSIGNMENT'. If two or more minimum value is exist this time i want to print column name have maximum value of SCORE_A - ... SCORE_D columns.
My data is:
I try to add 'Assignment' column.
On first, second and third row; there are only one minimum value of Fee_A - ... - Fee_D.
On fourth and fifth row; there are equal minimum value of fee_.. columns (eg. for fourth row; FEE_B = FEE_D = 2) But SCORE_D > SCORE_B (100>90) so value of ASSIGNMENT column is D.
On sixth row; there are two minimum value (FEE_C = FEE_D = 2) and values of SCORE_C and SCORE_D are equal. (60) For this case it doesn't matter what print 'C' or 'D' value.
For insert my data:
data table_1;
input
ID SCORE_A SCORE_B SCORE_C SCORE_D
FEE_A FEE_B FEE_C FEE_D;
datalines;
111111 70 85 50 60 2 4 3 8
222222 80 65 80 70 9 4 5 1
333333 60 70 70 100 5 3 5 4
444444 60 90 100 100 7 2 3 2
555555 40 70 80 50 4 1 5 1
666666 75 80 60 60 9 3 2 2
;
run;
I try to write code below, but i could not. In addition, actually I have approximately 100 columns both 'score_...' and 'fee_...' so I have to use macro and array.
data work.table_2;
set work.table_1;
array all_fee [*] &fee.; /*i create this macro before*/
counter = 0;
do i=1 to dim(all_fee);
if all_fee{i} = min(of all_fee(*)) then counter = counter + 1; else counter = counter;
end;
length multi_min $ 500;
length ASSIGNMENT $ 500;
do i=1 to dim(all_fee);
if counter = 0 then assignment = '' ; end;
else if counter = 1 then do;
if all_fee{i} = min(of all_fee(*)) then do;
assignment = substr((vname(all_fee{i})), 5, (length((vname(all_fee{i})))-4));,
end;
end;
else if counter > 1 then do;
if all_fee{i} = min(of all_fee(*)) then do;
multi_min = catx(' ', multi_min, (substr((vname(all_fee{i})), 5, (length((vname(all_fee{i})))-4))) );
end;
end;
else assignment = .;
end;
drop i;
run;
How should I write this code or Is there any shorter way? Can you help me?
Best Regards,
Thank you
If you are working with shuffled array (name suffixes not in the same order in the two sets), a bit of macro stuff may be necessary:
%macro listvars(prefix,suffixes);
%local i w;
%do i=1 %to 999999;
%let w=%scan(&suffixes,&i);
%if %length(&w)=0 %then %return;
%do; &prefix.&w%end;
%end;
%mend;
data want (drop = _:) ;
set have ;
array s %listvars(SCORE_,A B C D);
array f %listvars(FEE_,A B C D);
do over s ;
if f ne min (of f[*]) or s <= _ms then continue ;
_ms = s ;
assignment = put (scan (vname (f), 2, "_"), $upcase32.) ;
end ;
run ;
Apart from the new array definitions, the solution is the one presented by @hashman. The macro just generates a list like " FEE_A FEE_B FEE_C FEE_D". You can also skip the macro and write the full lists in your program, if you prefer.
Regarding your other questions:
In principle, you seem to head in the right direction by using arrays. But there's no need for macros of any kind, sort, or shape: You need to only diligently execute a straightforward piece of array logic. Also, using A, B, etc. as values for ASSIGNMENT is precarious: Since, as you say, you have 100+ variables like SCORE* and FEE* in your real file, you don't have enough letters to mark all of them apart. Perhaps you have combinations of letters and/or digits after the underscore in the variables' names, but you aren't telling us. At any rate, below, the part of the variable name after the underscore is used, whatever it is, which precludes any such collisions.
data have ;
input id score_a score_b score_c score_d fee_a fee_b fee_c fee_d ;
cards ;
111111 70 85 50 60 2 4 3 8
222222 80 65 80 70 9 4 5 1
333333 60 70 70 100 5 3 5 4
444444 60 90 100 100 7 2 3 2
555555 40 70 80 50 4 1 5 1
666666 75 80 60 60 9 3 2 2
;
run ;
data want (drop = _:) ;
set have ;
array s score_: ;
array f fee_: ;
do over s ;
if f ne min (of f[*]) or s <= _ms then continue ;
_ms = s ;
assignment = put (scan (vname (f), 2, "_"), $upcase32.) ;
end ;
run ;
Note that the "max" logic vis-a-vis the auxiliary _MS variable is partly made possible by the fact that the unretained _MS is auto-set to a missing value at the top of the DATA step implied loop.
Kind regards
Paul D.
@hashman I am very pleased for your contributions.
Unfortunately i couldn't understand what exactly difference "do over" and "do i=1 to dim(s)". And I guess "... then continue;" reference the next two lines, right it?
In addition; this structure don't work when "score_..." columns and "fee_..." columns are shuffled. For example if my data is like below, it doesn't result that i want. (I changed the order of 'score_d' and 'score_b')
data have1 ; input id score_a score_d score_c score_b fee_a fee_b fee_c fee_d ; cards ; 111111 70 60 50 85 2 4 3 8 222222 80 70 80 65 9 4 5 1 333333 60 100 70 70 5 3 5 4 444444 60 100 100 90 7 2 3 2 555555 40 50 80 70 4 1 5 1 666666 75 60 60 80 9 3 2 2 ; run ;
When the array is defined as:
array s score_: ;
it automatically creates an implicitly subscripted array S with the array index as an automatic variable _I_ (which is automatically dropped from the output). Then:
do over s ;
...
reference to s ...
...
end ;
means the same as:
do _i_ = 1 to dim (s) ;
...
reference to s[_i_] ...
...
end ;
So. basically, there's no difference between the @Ksharp's code and mine in this respect, except that he's using an explicitly subsripted array.
CONTINUE within a DO loop means: If the condition preceding it is true, pass program control to the bottom of the loop (i.e. the location right before END), ignoring all instructions between it and the bottom of the loop, and go to the next iteration of the loop.
If you shuffle either SCORE* or FEE* variables, so that their suffixes don't align, you redefine the task and need to specify which variable - SCORE* or FEE* - should supply its suffix to ASSIGNMENT after the minimal FEE is found for the maximum SCORE that resolves the ties among more than one minimal FEE.
I don't understand why you keep referencing some kind of macro. Macros have nothing to do with this task or code. Remember that macros are nothing but one of the means to assemble executable SAS code - and you have no code to assemble here. All you need is a straightforward (and correct according to the specs) piece of DATA step logic.
Kind regards
Paul D.
Thanks for explanation of 'array' and 'continue' templates, ok.
I am thinking i need to use macro. Because the order of columns may vary in the table. But my code should not be affected this case.
Actually i mean;
a, b, c... are my products that have names differently.
a ==> fee_a, score_a
b ==> fee_b, score_b
c ==> fee_c, score_c
Also they can called as;
pencil ==> fee_pencil, score_pencil (different features of the same product-pencil-)
book ==> fee_book, score_book (different features of the same product-book-)
rubber ==> fee_rubber, score_rubber (different features of the same product-rubber-)
So although their order is different (eg. score_d - score_b - fee_b - fee_d), my code structure should examine score_d for fee_d and should examine score_b for fee_b same way.
I hope i could explain 🙂
If you are working with shuffled array (name suffixes not in the same order in the two sets), a bit of macro stuff may be necessary:
%macro listvars(prefix,suffixes);
%local i w;
%do i=1 %to 999999;
%let w=%scan(&suffixes,&i);
%if %length(&w)=0 %then %return;
%do; &prefix.&w%end;
%end;
%mend;
data want (drop = _:) ;
set have ;
array s %listvars(SCORE_,A B C D);
array f %listvars(FEE_,A B C D);
do over s ;
if f ne min (of f[*]) or s <= _ms then continue ;
_ms = s ;
assignment = put (scan (vname (f), 2, "_"), $upcase32.) ;
end ;
run ;
Apart from the new array definitions, the solution is the one presented by @hashman. The macro just generates a list like " FEE_A FEE_B FEE_C FEE_D". You can also skip the macro and write the full lists in your program, if you prefer.
Regarding your other questions:
You can nest DO loops over implicit arrays all you want if you define the arrays with different indices, e.g.:
data _null_ ;
array ii (i) i1-i3 (1 2 3) ;
array jj (j) j1-j2 (4 5) ;
do over ii ;
do over jj ;
put ii= jj= ;
end ;
end ;
run ;
Result:
i1=1 j1=4 i1=1 j2=5 i2=2 j1=4 i2=2 j2=5 i3=3 j1=4 i3=3 j2=5
Moreover, you can create an implicit array that contains other implicit arrays as its elements - the ability woefully absent from explicitly subscripted arrays. SAS User's Guide: Basics, Version 5 Edition, pages 48-49 (45-49 about the implicit arrays in general). The functionality, despite being as of 1985, still works like a champ.
Kind regards
Paul D.
@hashman I'm curious here. What does an implicit array, that contains other implicit arrays look like?
Curiosity must be satisfied ;). Don't have my V5 Guide with me, so just off the top of my head:
data _null_ ;
array a (i) a1-a4 (11 21 31 41) ;
array b (i) b1-b4 (21 22 23 24) ;
array c (i) c1-c4 (31 32 33 34) ;
array abc a b c ;
do over abc ;
do i = 1 to dim (a) ;
put abc @ ;
end ;
put ;
end ;
put ;
do i = 1 to dim (a) ;
do over abc ;
put abc @ ;
end ;
put ;
end ;
run ;
Which prints in the log:
11 21 31 41 21 22 23 24 31 32 33 34 11 21 31 21 22 32 31 23 33 41 24 34
Kind regards
Paul D.
@hashman very cool. Thank you!
As @Ksharp has pointed out, you just need to specify the SCORE and FEE variables to both arrays in the same suffix order. Which one - doesn't matter, as long as it's the same.
You can simply hard code the array variables in the desired order. (Essentially, the macro invoked in your accepted solution does the same thing, except that you don't have to specify the prefixes.) But this is hardly a satisfactory solution since, as you've informed us, the list of the variables can be 100+ long: Are you going to type them all or type their suffixes into the macro call?
Good code is dynamic and shouldn't depend on this kind of vagaries. In this case, the simplest is to pre-create two SCORE and FEE variable lists, both ordered by the variable suffix and feed them into the already working code I offered originally. This way, the solution doesn't depend at all on the order in which the SCORE and FEE variables are listed in HAVE as long as for each FEE suffix there's the same and only one SCORE suffix:
data have ;
input id score_a score_d score_c score_b
fee_a fee_b fee_c fee_d
;
cards ;
111111 70 60 50 85 2 4 3 8
222222 80 70 80 65 9 4 5 1
333333 60 100 70 70 5 3 5 4
444444 60 100 100 90 7 2 3 2
555555 40 50 80 70 4 1 5 1
666666 75 60 60 80 9 3 2 2
;
run ;
proc sql noprint ;
select name into :s separated by " " from dictionary.columns
where libname="WORK" and memname="HAVE" and lower (name) like 'score_%'
order lower (name) ;
select name into :f separated by " " from dictionary.columns
where libname="WORK" and memname="HAVE" and lower (name) like 'fee_%'
order lower (name) ;
quit ;
data want (drop = _:) ;
set have ;
array s &s ;
array f &f ;
do over s ;
if f ne min (of f[*]) or s <= _ms then continue ;
_ms = s ;
assignment = put (scan (vname (f), 2, "_"), $upcase32.) ;
end ;
run ;
It does involve a couple of macro variables but it's not at all the same as using a macro.
However, in reality you don't have to even use any macro variables at all - indeed, don't even need to have the variable name lists sorted. In this case, it simply means that for a given FEE_suffix you need to locate the SCORE_suffix variable with the same suffix, which in turn means that the SCORE suffixes and their corresponding array S indices should be stored in some kind of lookup table. Below, this table takes on the form of the temporary array Z with enough elements to cover the territory (i.e. it is assumed that you have fewer than 10000 SCORE and FEE variables - if not enough, just add another 9 to the array size):
data want (drop = _:) ;
set have ;
array s (_s) score_: ;
array f (_f) fee_: ;
array z [9999] $ 26 _temporary_ ;
if _n_ = 1 then do over s ;
z[_s] = upcase (substr (vname (s), 1 + find (vname (s), "_"))) ;
end ;
length _sfx assignment $ 26 ; *32-length('score') ;
do over f ;
_sfx = upcase (substr (vname (f), 1 + find (vname (f), "_"))) ;
_s = whichc (_sfx, of z[*]) ;
if f ne min (of f[*]) or s <= _ms then continue ;
_ms = s ;
assignment = _sfx ;
end ;
run ;
The code is basically the same as the original, except now the index _S is not the same as _F but is dictated by the suffix of FEE in the current iteration of the DO loop.
Some folks poo-poo the concept of allocating a "big enough" lookup table, even if it's well justified (as in the case above). But as of the current state of the SAS software, it doesn't have to be relied upon, and the table can be made dynamic instead:
data want (drop = _:) ;
set have ;
array s (_s) score_: ;
array f (_f) fee_: ;
if _n_ = 1 then do ;
dcl hash z () ;
z.definekey ("_sfx") ;
z.definedata ("_s") ;
z.definedone () ;
do over s ;
z.add (key:upcase (substr (vname (s), 1 + find (vname (s), "_"))), data:_s) ;
end ;
end ;
length _sfx assignment $ 26 ; *32-length('score') ;
do over f ;
_sfx = upcase (substr (vname (f), 1 + find (vname (f), "_"))) ;
z.find() ;
if f ne min (of f[*]) or s <= _ms then continue ;
_ms = s ;
assignment = _sfx ;
end ;
run ;
Above the index _S needed to ID the required SCORE variable is extracted from the hash table Z instead of array Z. The result, of course, is the same, except that with a large number of variables SCORE and FEE, the hash table is faster since it searches for _SFX in O(1) time, while WHICHN does the same in O(N) time.
Kind regards
Paul D.
data have ;
input id score_a score_b score_c score_d fee_a fee_b fee_c fee_d ;
cards ;
111111 70 85 50 60 2 4 3 8
222222 80 65 80 70 9 4 5 1
333333 60 70 70 100 5 3 5 4
444444 60 90 100 100 7 2 3 2
555555 40 70 80 50 4 1 5 1
666666 75 80 60 60 9 3 2 2
;
run ;
data want ;
set have ;
array s{*} score_: ;
array f{*}fee_: ;
min=min(of f{*});
do i=1 to dim(f);
if f{i}=min and s{i}>max then do;max=s{i};want=scan(vname(f{i}),-1,'_'); end;
end;
drop i min max;
run;
@Ksharp I am very pleased for your contributions., thank you.
But unfortunately this structure don't work when "score_..." columns and "fee_..." columns are shuffled. For example if my data is like below, it doesn't result that i want. (I changed the order of 'score_d' and 'score_b')
data have1 ; input id score_a score_d score_c score_b fee_a fee_b fee_c fee_d ; cards ; 111111 70 60 50 85 2 4 3 8 222222 80 70 80 65 9 4 5 1 333333 60 100 70 70 5 3 5 4 444444 60 100 100 90 7 2 3 2 555555 40 50 80 70 4 1 5 1 666666 75 60 60 80 9 3 2 2 ; run ;
I think I need the code like below but I couldn't define macro. How can i do get macro in a loop?
data want ;
set have1 ;
array s{*} score_: ;
array f{*}fee_: ;
counter = 0;
do i=1 to dim(f);
if f{i} = min(of f(*)) then counter = counter + 1; else counter = counter;
end;
min=min(of f{*});
length wants $ 500;
do i=1 to dim(f);
if f{i}=min then do;
if counter <= 1 then want=scan(vname(f{i}),-1,'_');
else if counter > 1 then do;
wants=catx(' ', wants, scan(vname(f{i}),-1,'_') ) into:array_wants separated by ' ';
array w [*] &array_wants.;
do i=1 to dim(w);
if w{i}=max(of w{*}) then want=scan(vname(w{i}),-1,'_');
end;
end;
end;
end;
drop i min max;
run;
As Paul said, specified arrays explicitly :
array s{*} score_a score_b score_c score_d;
array f{*} fee_a fee_b fee_c fee_d;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.