I am using SAS 9.4 I would like to sum a number of variables, pop1 to popNN, where the greatest number, NN, will differ with every run. Is there a way to reference a number dynamically in a sum(of pop1--popNN) function? Can I use a macro variable here, like sum(of pop1--pop¯ovariable.) ?
thanks.
You can do something like this:
data want;
set have;
sum_of_pop=sum(of pop:);
run;
which tells SAS to sum all variables whose name begins with the letters "POP". SAS will figure out how many such variables there are.
Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
You can use colon wildcard option here, assuming you have a unique and consistent prefix.
sum(of pop:)
@GoBlueSAS wrote:
I am using SAS 9.4 I would like to sum a number of variables, pop1 to popNN, where the greatest number, NN, will differ with every run. Is there a way to reference a number dynamically in a sum(of pop1--popNN) function? Can I use a macro variable here, like sum(of pop1--pop¯ovariable.) ?
thanks.
The colon wildcard is the best option if the prefixes are suitable.
Otherwise, if you can create a missing variable after POPnn (let's call it LASTVAR), and if the POP variables are consecutive, you can use
sum (of POP1--LASTVAR)
You should also consider working with a long dataset layout, where you have only one variable and sum vertically as needed.
Yet another take on this.
The colon wildcard ,sum(of pop:), is the simplest, but it may not work if you have other variables that begin with POP.
The double dash approach ,sum(of pop--lastvar) only works if the POP variables are last in your input data.
A more safe approach may be this:
data want;
retain pop1-pop99 .;
set have;
sum=sum(of pop1-pop99);
run;
That way, all the variables will exist, and the order of the variables on the dataset or the names of other variables do not matter.
EDIT: changed the retain value from 0 to missing, so that means or standard deviations will also be calculated correctly.
@s_lassen wrote:
Yet another take on this.
The colon wildcard ,sum(of pop:), is the simplest, but it may not work if you have other variables that begin with POP.
The double dash approach ,sum(of pop--lastvar) only works if the POP variables are last in your input data.
A more safe approach may be this:
data want; retain pop1-pop99 .; set have; sum=sum(of pop1-pop99); run;
That way, all the variables will exist, and the order of the variables on the dataset or the names of other variables do not matter.
This is a good warning and a good solution, but if the user only started out with say POP1-POP59, they would now have 40 excess variables in the output dataset.
One might get around this by:
data want;
set have;
sum=sum(of pop1:, of pop2:, of pop3:, of pop4:, of pop5:, of pop6:, of pop7:, of pop8:, of pop9:);
run;
It's a bit wordy but:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.