Suppose I have 100 columns, sorted by id and time.
My cols are conveniently named col1, col2, ..., col100.
This seems like a really stupid question, but how would I create a new variable which multiples all the columns, and then adds 10 at the end?
I tried using do loops, but it just confused me. I know I can just manually type it out, but
Basically, I want:
new_variable = (col1)x(col2)x...x(col100)+10
Thanks!
@UniversitySas wrote:
Quick question - what does the **dim(col) do?
dim(col) is the number of elements in the array col, i.e. 100, and ** is the exponentiation operator. The hundredth root of the product (=the geometric mean, if applicable) is raised to the hundredth power to retrieve the product.
In this case, however, I may potentially have a final negative value across each row, so I don't think geometric mean will work here.
Are you aware of a work around?
A "final negative value" -- do you mean that only the last factor, col100, might be negative?
If so, you can modify the formula as follows:
new_variable=sign(col100)*geomean(of col1-col99, abs(col100))**100+10;
This would be valid irrespective of the sign of col100, as long as col1, ..., col99 are non-negative.
If negative values could occur anywhere, then use a DO loop:
data want;
set have;
array col[100];
new_variable=1;
do _n_=1 to 100;
new_variable=new_variable*col[_n_];
end;
new_variable+10;
run;
Hello @UniversitySas,
You can use the GEOMEAN function:
data want;
set have;
array col[100];
new_variable=geomean(of col[*])**dim(col)+10;
run;
provided that none of the factors is negative.
Shorter without array:
data want;
set have;
new_variable=geomean(of col1-col100)**100+10;
run;
Quick question - what does the **dim(col) do?
In this case, however, I may potentially have a final negative value across each row, so I don't think geometric mean will work here.
Are you aware of a work around?
@UniversitySas wrote:
Quick question - what does the **dim(col) do?
dim(col) is the number of elements in the array col, i.e. 100, and ** is the exponentiation operator. The hundredth root of the product (=the geometric mean, if applicable) is raised to the hundredth power to retrieve the product.
In this case, however, I may potentially have a final negative value across each row, so I don't think geometric mean will work here.
Are you aware of a work around?
A "final negative value" -- do you mean that only the last factor, col100, might be negative?
If so, you can modify the formula as follows:
new_variable=sign(col100)*geomean(of col1-col99, abs(col100))**100+10;
This would be valid irrespective of the sign of col100, as long as col1, ..., col99 are non-negative.
If negative values could occur anywhere, then use a DO loop:
data want;
set have;
array col[100];
new_variable=1;
do _n_=1 to 100;
new_variable=new_variable*col[_n_];
end;
new_variable+10;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.