I am using SAS 9.4 I would like to sum a number of variables, pop1 to popNN, where the greatest number, NN, will differ with every run. Is there a way to reference a number dynamically in a sum(of pop1--popNN) function? Can I use a macro variable here, like sum(of pop1--pop¯ovariable.) ?
thanks.
You can do something like this:
data want;
set have;
sum_of_pop=sum(of pop:);
run;
which tells SAS to sum all variables whose name begins with the letters "POP". SAS will figure out how many such variables there are.
Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html
You can use colon wildcard option here, assuming you have a unique and consistent prefix.
sum(of pop:)
@GoBlueSAS wrote:
I am using SAS 9.4 I would like to sum a number of variables, pop1 to popNN, where the greatest number, NN, will differ with every run. Is there a way to reference a number dynamically in a sum(of pop1--popNN) function? Can I use a macro variable here, like sum(of pop1--pop¯ovariable.) ?
thanks.
The colon wildcard is the best option if the prefixes are suitable.
Otherwise, if you can create a missing variable after POPnn (let's call it LASTVAR), and if the POP variables are consecutive, you can use
sum (of POP1--LASTVAR)
You should also consider working with a long dataset layout, where you have only one variable and sum vertically as needed.
Yet another take on this.
The colon wildcard ,sum(of pop:), is the simplest, but it may not work if you have other variables that begin with POP.
The double dash approach ,sum(of pop--lastvar) only works if the POP variables are last in your input data.
A more safe approach may be this:
data want;
retain pop1-pop99 .;
set have;
sum=sum(of pop1-pop99);
run;
That way, all the variables will exist, and the order of the variables on the dataset or the names of other variables do not matter.
EDIT: changed the retain value from 0 to missing, so that means or standard deviations will also be calculated correctly.
@s_lassen wrote:
Yet another take on this.
The colon wildcard ,sum(of pop:), is the simplest, but it may not work if you have other variables that begin with POP.
The double dash approach ,sum(of pop--lastvar) only works if the POP variables are last in your input data.
A more safe approach may be this:
data want; retain pop1-pop99 .; set have; sum=sum(of pop1-pop99); run;That way, all the variables will exist, and the order of the variables on the dataset or the names of other variables do not matter.
This is a good warning and a good solution, but if the user only started out with say POP1-POP59, they would now have 40 excess variables in the output dataset.
One might get around this by:
data want;
set have;
sum=sum(of pop1:, of pop2:, of pop3:, of pop4:, of pop5:, of pop6:, of pop7:, of pop8:, of pop9:);
run;
It's a bit wordy but:
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.