Solved: Re: Multiply column variables to create a new variable. PROC IML or SA...

dammie_101 · Posted 05-22-2019 07:04 AM

Good day everyone.

Please i need some advice, as well as help with something.

I want to create a new variable by asking SAS to multiply the values in the cells of input columns, and repeat this process for all observations. As it in the code below...I want to ask SAS to start with obs 1 (row 1) and multiply the values of grp1 - grp5 to create new variable "rowproduct" with the result in that cell. Then it should go on to obs 2 (row 2)...etc until the end.

Will the IML procedure be better suited for this or will it more appropriate to use ARRAY?

Example code to solve this will be highly appreciated.

Thank you very much.

Dami.

data test;
input TargetID	grp1	grp2	grp3	grp4	grp5;
datalines;
7314326	88103	102695	118879	82913	95039
10031881	253	255	74	36	442
7213321	3431	5344	1134	637	4054
7213330	2765	4184	1143	635	2722
7214940	447	269	11328	9417	145
;
run;

Kurt_Bremser · Posted 05-23-2019 05:22 AM

data want;
set test;

array myarr {*} grp:;
/* defines an array over all variables the name of which starts with grp */
/* the dimension (*) is dynamic, dependent on the number of variables found */
/* the colon can be used as wildcard character in SAS code for variables or dataset names */
/* but can only be used at the end */

rowproduct = 1;
/* initializes the new variable, as a simple calculation on missing values would result in missing values */

do i = 1 to dim(myarr); /* upper bound depends on number of variables found */
  rowproduct = rowproduct * coalesce(myarr{i},1);
  /* coalesce() returns the first non-missing argument */
  /* prevents missing values to destroy our result */

end;
drop i;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 05-22-2019 07:22 AM

Use an array:

data want;
set test;
array myarr {*} grp:;
rowproduct = 1;
do i = 1 to dim(myarr);
  rowproduct = rowproduct * coalesce(myarr{i},1);
end;
drop i;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

dammie_101 · Posted 05-22-2019 09:40 AM

This works perfectly!

I am grateful.

Thank you so much!

Kurt_Bremser · Posted 05-22-2019 09:57 AM

@dammie_101 wrote:

This works perfectly!

I am grateful.

Thank you so much!

Fine! Is the code clear to you, or do you need additional explanation?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

dammie_101 · Posted 05-22-2019 10:12 AM

i think i get the logic:

SAS scans and calls in all variables from the input dataset into array procedure;

set column 'rowproduct' to an initial of 1;

initiate array procedure from column 1 until last element in the array dimension;

rowproduct => (a) call initial value of rowproduct...multiply this with (b) call cell values in array dimension, multiply values) until last entry;

dammie_101 · Posted 05-23-2019 05:10 AM

I am open to learning more. I absolutely would not mind additional explanation.

@Kurt_Bremser wrote:
@dammie_101 wrote:
This works perfectly!
I am grateful.
Thank you so much!
Fine! Is the code clear to you, or do you need additional explanation?

Kurt_Bremser · Posted 05-23-2019 05:22 AM

data want;
set test;

array myarr {*} grp:;
/* defines an array over all variables the name of which starts with grp */
/* the dimension (*) is dynamic, dependent on the number of variables found */
/* the colon can be used as wildcard character in SAS code for variables or dataset names */
/* but can only be used at the end */

rowproduct = 1;
/* initializes the new variable, as a simple calculation on missing values would result in missing values */

do i = 1 to dim(myarr); /* upper bound depends on number of variables found */
  rowproduct = rowproduct * coalesce(myarr{i},1);
  /* coalesce() returns the first non-missing argument */
  /* prevents missing values to destroy our result */

end;
drop i;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

dammie_101 · Posted 05-23-2019 06:47 AM

Thank you so very much!

I am deeply grateful.

FreelanceReinh · Posted 05-22-2019 07:36 AM

Or keep it simple:

data want;
set test;
rowproduct=grp1*grp2*grp3*grp4*grp5;
run;

A shorter alternative (especially if the product involved more than only 5 "GRP" variables) would be something like

rowproduct=geomean(of grp:)**5;

where all variables would be multiplied whose names start with "GRP" (use grp1-grp5 if grp: was too comprehensive) and the exponent (5) is the number of these variables. Restriction: The GEOMEAN function requires non-negative arguments. However, if missing values could occur, the exponent would need to be the number of non-missing values:

rowproduct=geomean(of grp:)**n(of grp:);

(The GEOMEAN function ignores missing values unlike the explicit product).

dammie_101 · Posted 05-22-2019 09:18 AM

how about in a situation where the column variables have different names? As an example instead of grp1, grp2, grp3 etc the variable names are liverKO kidneyKO bladderKO heartKO lungsKO. (I am going to get to that point in the next couple of weeks in the course of my experiments, hence why i am asking).

what would the line statement

rowproduct = geomean(of grp:)**5

look like?

Thanks in advance.

FreelanceReinh · Posted 05-22-2019 11:29 AM

@dammie_101 wrote:

how about in a situation where the column variables have different names? As an example instead of grp1, grp2, grp3 etc the variable names are liverKO kidneyKO bladderKO heartKO lungsKO.

what would the line statement
rowproduct = geomean(of grp:)**5
look like?

Thanks in advance.

If those five variables are adjacent in dataset TEST, i.e., if their variable numbers in PROC CONTENTS output of TEST (see column "#"; ideally use the VARNUM option) are consecutive and, let's say, liverKO is the first and lungsKO the last variable in that order, you can write

rowproduct=geomean(of liverKO--lungsKO)**5;

(assuming there are no missing values, as explained earlier; otherwise replace "5" by "n(of liverKO--lungsKO)" if you want to obtain the product of the non-missing values in this case).

A similar notation (liverKO-numeric-lungsKO) would apply if the sequence of the five variables was only interrupted by character variables.

Alternatively, you can define an array (see the ARRAY statement in KurtBremser's post, where "grp:" would be replaced either by the list of all five variable names or, if applicable, by the variable list in the notation I've just described). Then the formula would read

rowproduct=geomean(of myarr{*})**5;

(or, again, with n(of myarr{*}) as the exponent).

That said, if the number of factors is as small as five, I would prefer the simple definition

rowproduct=liverKO*kidneyKO*bladderKO*heartKO*lungsKO;

which is much easier to read, even if the COALESCE function (see KurtBremser's solution) was applied to each factor in order to ignore missing values.

dammie_101 · Posted 05-22-2019 04:38 PM

understood.

I am sincerely grateful for taking time to explain to that depth of detail.

Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

Re: Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

Re: Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

Re: Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

Re: Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

Re: Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

Re: Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

Re: Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

Re: Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

Re: Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

Re: Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

Re: Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

Re: Multiply column variables to create a new variable. PROC IML or SAS ARRAY?

SAS Innovate 2025: Call for Content