Hi,
I have a macro which creates a dataset which size depends on the input :
Columns names : a_1, a_2, a_3, a_4 ... .
I want to compute the mean of all the columns but the first two in the macro :
DATA want;
SET have;
mean_value=mean(a_3, a_4 ....);
RUN;
How can I dynamically select the columns ? (I know all the columns' name)
Maybe with columns' index/position (Mean on the 3rd column to the last one) ?
Gluttony
Well, if the variables of interest all begin with a_ then you can use something like this:
mean_value = mean(of a_3-a_19);
Well, if the variables of interest all begin with a_ then you can use something like this:
mean_value = mean(of a_3-a_19);
Is your issue that you do not know how many a_ variables there are and need help determining that?
Will you guarantee that there will always be 3 or more values?
Then perhaps:
data want; set have; array aa a_: ; mean_value = (sum (of aa(*)) - sum(a_1,a_2))/(n (of aa(*)) - 2); run;
but you would need to check how many values are in the AA array if you can't be sure there are always 3 or more.
@ballardw wrote:
Is your issue that you do not know how many a_ variables there are and need help determining that?
Will you guarantee that there will always be 3 or more values?
Then perhaps:
data want; set have; array aa a_: ; mean_value = (sum (of aa(*)) - sum(a_1,a_2))/(n (of aa(*)) - 2); run;but you would need to check how many values are in the AA array if you can't be sure there are always 3 or more.
Doesn't work if a_1 and/or a_2 is missing, the denominator will be wrong.
@PaigeMiller wrote:
@ballardw wrote:
Is your issue that you do not know how many a_ variables there are and need help determining that?
Will you guarantee that there will always be 3 or more values?
Then perhaps:
data want; set have; array aa a_: ; mean_value = (sum (of aa(*)) - sum(a_1,a_2))/(n (of aa(*)) - 2); run;but you would need to check how many values are in the AA array if you can't be sure there are always 3 or more.
Doesn't work if a_1 and/or a_2 is missing, the denominator will be wrong.
Yep.
Incomplete data description => incomplete response.
In my case, the columns were :
a_1, a_2, b_1, b_2, b_3 .... b_n
and I wanted to compute the mean on the "b" columns,
So @PaigeMiller's trick (before the edit) about using :
mean(of b:)
is perfect.
Thank you,
Gluttony
as long as there are no other variables whose name starts with B in your data set, that would work. For example, if your data set also contains variable BirthYear then the B: in the MEAN function won't work.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.