Good day everyone.
Please i need some advice, as well as help with something.
I want to create a new variable by asking SAS to multiply the values in the cells of input columns, and repeat this process for all observations. As it in the code below...I want to ask SAS to start with obs 1 (row 1) and multiply the values of grp1 - grp5 to create new variable "rowproduct" with the result in that cell. Then it should go on to obs 2 (row 2)...etc until the end.
Will the IML procedure be better suited for this or will it more appropriate to use ARRAY?
Example code to solve this will be highly appreciated.
Thank you very much.
Dami.
data test;
input TargetID	grp1	grp2	grp3	grp4	grp5;
datalines;
7314326	88103	102695	118879	82913	95039
10031881	253	255	74	36	442
7213321	3431	5344	1134	637	4054
7213330	2765	4184	1143	635	2722
7214940	447	269	11328	9417	145
;
run;
data want;
set test;
array myarr {*} grp:;
/* defines an array over all variables the name of which starts with grp */
/* the dimension (*) is dynamic, dependent on the number of variables found */
/* the colon can be used as wildcard character in SAS code for variables or dataset names */
/* but can only be used at the end */
rowproduct = 1;
/* initializes the new variable, as a simple calculation on missing values would result in missing values */
do i = 1 to dim(myarr); /* upper bound depends on number of variables found */
  rowproduct = rowproduct * coalesce(myarr{i},1);
  /* coalesce() returns the first non-missing argument */
  /* prevents missing values to destroy our result */
end;
drop i;
run;Use an array:
data want;
set test;
array myarr {*} grp:;
rowproduct = 1;
do i = 1 to dim(myarr);
  rowproduct = rowproduct * coalesce(myarr{i},1);
end;
drop i;
run;This works perfectly!
I am grateful.
Thank you so much!
@dammie_101 wrote:
This works perfectly!
I am grateful.
Thank you so much!
Fine! Is the code clear to you, or do you need additional explanation?
i think i get the logic:
SAS scans and calls in all variables from the input dataset into array procedure;
set column 'rowproduct' to an initial of 1;
initiate array procedure from column 1 until last element in the array dimension;
rowproduct => (a) call initial value of rowproduct...multiply this with (b) call cell values in array dimension, multiply values) until last entry;
I am open to learning more. I absolutely would not mind additional explanation.
@Kurt_Bremser wrote:
@dammie_101 wrote:This works perfectly!
I am grateful.
Thank you so much!
Fine! Is the code clear to you, or do you need additional explanation?
data want;
set test;
array myarr {*} grp:;
/* defines an array over all variables the name of which starts with grp */
/* the dimension (*) is dynamic, dependent on the number of variables found */
/* the colon can be used as wildcard character in SAS code for variables or dataset names */
/* but can only be used at the end */
rowproduct = 1;
/* initializes the new variable, as a simple calculation on missing values would result in missing values */
do i = 1 to dim(myarr); /* upper bound depends on number of variables found */
  rowproduct = rowproduct * coalesce(myarr{i},1);
  /* coalesce() returns the first non-missing argument */
  /* prevents missing values to destroy our result */
end;
drop i;
run;Thank you so very much!
I am deeply grateful.
Or keep it simple:
data want;
set test;
rowproduct=grp1*grp2*grp3*grp4*grp5;
run;A shorter alternative (especially if the product involved more than only 5 "GRP" variables) would be something like
rowproduct=geomean(of grp:)**5;where all variables would be multiplied whose names start with "GRP" (use grp1-grp5 if grp: was too comprehensive) and the exponent (5) is the number of these variables. Restriction: The GEOMEAN function requires non-negative arguments. However, if missing values could occur, the exponent would need to be the number of non-missing values:
rowproduct=geomean(of grp:)**n(of grp:);(The GEOMEAN function ignores missing values unlike the explicit product).
how about in a situation where the column variables have different names? As an example instead of grp1, grp2, grp3 etc the variable names are liverKO kidneyKO bladderKO heartKO lungsKO. (I am going to get to that point in the next couple of weeks in the course of my experiments, hence why i am asking).
what would the line statement
rowproduct = geomean(of grp:)**5look like?
Thanks in advance.
@dammie_101 wrote:
how about in a situation where the column variables have different names? As an example instead of grp1, grp2, grp3 etc the variable names are liverKO kidneyKO bladderKO heartKO lungsKO.
what would the line statement
rowproduct = geomean(of grp:)**5look like?
Thanks in advance.
If those five variables are adjacent in dataset TEST, i.e., if their variable numbers in PROC CONTENTS output of TEST (see column "#"; ideally use the VARNUM option) are consecutive and, let's say, liverKO is the first and lungsKO the last variable in that order, you can write
rowproduct=geomean(of liverKO--lungsKO)**5;(assuming there are no missing values, as explained earlier; otherwise replace "5" by "n(of liverKO--lungsKO)" if you want to obtain the product of the non-missing values in this case).
A similar notation (liverKO-numeric-lungsKO) would apply if the sequence of the five variables was only interrupted by character variables.
Alternatively, you can define an array (see the ARRAY statement in KurtBremser's post, where "grp:" would be replaced either by the list of all five variable names or, if applicable, by the variable list in the notation I've just described). Then the formula would read
rowproduct=geomean(of myarr{*})**5;(or, again, with n(of myarr{*}) as the exponent).
That said, if the number of factors is as small as five, I would prefer the simple definition
rowproduct=liverKO*kidneyKO*bladderKO*heartKO*lungsKO;which is much easier to read, even if the COALESCE function (see KurtBremser's solution) was applied to each factor in order to ignore missing values.
understood.
I am sincerely grateful for taking time to explain to that depth of detail.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.
