My data looks like this.
Obs Id Numeric Character
1 1 1 xxx
2 1 2 xxx
3 2 3 xyz
4 2 4 xyz
5 3 5 zzz
6 3 6 zzz
What I want to do is create a new table that is grouped by the ID variable, while summing the numeric variable and retaining the value of the character variable (which is the same for every ID).
I know how to do this for this exact example with proc sql, a select statement and summing Numeric, and using a group by statement, but the problem is there are hundreds of numeric variables in my actual data set and I want to do it for every one. Anyone have any ideas?
Thanks
Use the AUTONAME option of proc summary to avoid trouble with variable names
proc summary data=have;
by id notsorted;
id character;
var _numeric_;
output out=want( drop=id_sum _: ) sum= / autoname;
run;
UNTESTED CODE
proc summary data=whatever;
id character;
class id;
var _numeric_;
output out=sums sum=;
run;
Use the AUTONAME option of proc summary to avoid trouble with variable names
proc summary data=have;
by id notsorted;
id character;
var _numeric_;
output out=want( drop=id_sum _: ) sum= / autoname;
run;
What "trouble with variable names" can happen in this situation?
When ID is being used as a class or by variable but is also a _numeric_, it ends up being twice in the output dataset. Proc Summary doesn't like that.
I am not sure which interpretation is correct, so I have included two versions.
If the objective is to do running sums of each numeric variable (individually), grouped by ID, then in theory, something like this could work:
DATA have;
INPUT id Numeric1 Numeric2 Numeric3 Numeric4 Character $;
DATALINES;
1 1 11 115 1199 xxx
1 2 22 225 2299 xxx
2 3 33 335 3399 xyz
2 4 44 445 4499 xyz
3 5 55 555 5599 zzz
3 6 66 665 6699 zzz
;
RUN;
/* untested code */
DATA want;
SET have;
BY id; /* assumes your data is sorted properly */
LENGTH Sum1-Sum4 8;
ARRAY num_array Numeric1-Numeric4;
ARRAY sum_array Sum1-Sum4;
DO OVER num_array;
IF first.id THEN sum_array = num_array;
ELSE sum_array + num_array;
END;
RUN;
If the objective is to lump all numeric variables together (with the exception of ID) and find their running sum, something like this could work:
/* untested code */
DATA want;
SET have;
BY id;
/* id is numeric and it will be summed; thus negate it ... */
obs_total = SUM(of _numeric_) - id;
IF first.id THEN running_total = obs_total;
ELSE running_total + obs_total;
RUN;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.