Dear SAS users,
Below is the sample dataset and a program that will produce a one line output that include team name and color and each team's multiple game result. It works well.
My question is suppose there are multiple entities (show up as a new id column such as schools id) and each entity will have the same set of groups, this is normal because schools may have the same set of groups and they will come to compete in some activities and got various result at different round of contests.
The new dataset (call it team_new, the new column is sch_id) will be like this:
sch_id color group game1 game2 game 3
sch1 Green Crickets 10 7 8
sch1 Blue Sea Otters 10 6 7
sch1 Yellow Stingers 9 10 9
sch1 Red Hot Ants 8 9 9
sch1 Purple Cats 9 9 9
sch2 Green Crickets 10 9 7
sch2 Blue Sea Otters 8 7 9
sch2 Yellow Stingers 7 8 10
sch2 Red Hot Ants 9 8 10
sch2 Purple Cats 8 6 9
sch3 Green Crickets 5 7 9
sch3 Blue Sea Otters 6 8 10
sch3 Yellow Stingers 7 9 8
sch3 Red Hot Ants 6 9 10
sch3 Purple Cats 8 10 9
;
The desired output will be like this:
sch_id greencricketstotal blueseaottertotal yellowstingerstotal redhotantstotl purplecatstotal
sch1 25 23 28 26 27
sch2 xx xx xx xx xx
sch3 xx xx xx xx xx
*note above XX will be the sum of three competition's result, not run yet.
data teams; *This is the original sample dataset's name; input color $15. @16 team_name $15. @32 game1 game2 game3; datalines; Green Crickets 10 7 8 Blue Sea Otters 10 6 7 Yellow Stingers 9 10 9 Red Hot Ants 8 9 9 Purple Cats 9 9 9 ; %macro newvars(dsn); data _null_; set &dsn end=end; count+1; call symputx('macvar'||left(count),compress(color)||compress(team_name)||"Total"); if end then call symputx('max',count); run; data teamscores; set &dsn end=end; %do i = 1 %to &max; if _n_=&i then do; &&macvar&i=sum(of game1-game3); retain &&macvar&i; keep &&macvar&i; end; %end; if end then output; %mend newvars; %newvars(teams) proc print noobs; title "League Team Game Totals"; run;
The above program runs well. However, when I created the new dataset team_new with one more new column sch_id, and I tried to modify the above program, I can have it run, but only produce one line of output, which is the last color+group's record. I know I need to have more codes, but just cannot get it out the way I need. Any idea what I should do? By the way I would like to use the macro, with some modification, and do not want to do a transpose step. Thank you.
What is the purpose of the output data set? I does not appear to be well structured for most analysis tasks.
If the use of the data is just to make a report that people read then you likely want a report procedure.
Maybe something like:
data teams; input sch_id $ color $15. @16 team_name $15. @32 game1 game2 game3; gametotal= sum(game1, game2, game3); datalines; sch1 Green Crickets 10 7 8 sch1 Blue Sea Otters 10 6 7 sch1 Yellow Stingers 9 10 9 sch1 Red Hot Ants 8 9 9 sch1 Purple Cats 9 9 9 sch2 Green Crickets 10 9 7 sch2 Blue Sea Otters 8 7 9 sch2 Yellow Stingers 7 8 10 sch2 Red Hot Ants 9 8 10 sch2 Purple Cats 8 6 9 sch3 Green Crickets 5 7 9 sch3 Blue Sea Otters 6 8 10 sch3 Yellow Stingers 7 9 8 sch3 Red Hot Ants 6 9 10 sch3 Purple Cats 8 10 9 ; proc tabulate data=teams; class sch_id color team_name; var gametotal; table sch_id=' '*gametotal=' '* sum=' ', color=' '*team_name=' ' / row=float ; run;
And you original data would be better off with one record per game plus a variable to identify which game the score reflects. Then a report can be written that automatically adjusts to the number of games.
Consider:
data newteams; input sch_id $ color :$15. @16 team_name $15. @32 gamenumber score; datalines; sch1 Green Crickets 1 10 sch1 Green Crickets 2 7 sch1 Green Crickets 3 8 sch1 Green Crickets 4 6 sch1 Blue Sea Otters 1 10 sch1 Blue Sea Otters 2 6 sch1 Blue Sea Otters 3 7 sch1 Blue Sea Otters 4 3 sch1 Yellow Stingers 1 9 sch1 Yellow Stingers 2 10 sch1 Yellow Stingers 3 9 sch1 Yellow Stingers 4 7 sch1 Red Hot Ants 1 8 sch1 Red Hot Ants 2 9 sch1 Red Hot Ants 3 9 sch1 Purple Cats 1 9 sch1 Purple Cats 2 9 sch1 Purple Cats 3 9 sch2 Green Crickets 1 10 sch2 Green Crickets 2 9 sch2 Green Crickets 3 7 sch2 Blue Sea Otters 1 8 sch2 Blue Sea Otters 2 7 sch2 Blue Sea Otters 3 9 sch2 Yellow Stingers 1 7 sch2 Yellow Stingers 2 8 sch2 Yellow Stingers 3 10 sch2 Red Hot Ants 1 9 sch2 Red Hot Ants 2 8 sch2 Red Hot Ants 3 10 sch2 Purple Cats 1 8 sch2 Purple Cats 2 6 sch2 Purple Cats 3 9 sch3 Green Crickets 1 5 sch3 Green Crickets 2 7 sch3 Green Crickets 3 9 sch3 Blue Sea Otters 1 6 sch3 Blue Sea Otters 2 8 sch3 Blue Sea Otters 3 10 sch3 Yellow Stingers 1 7 sch3 Yellow Stingers 2 9 sch3 Yellow Stingers 3 8 sch3 Red Hot Ants 1 6 sch3 Red Hot Ants 2 9 sch3 Red Hot Ants 3 10 sch3 Purple Cats 1 8 sch3 Purple Cats 2 10 sch3 Purple Cats 3 9 ; proc tabulate data=newteams; class sch_id color team_name; var score; table sch_id=' '*score=' '* sum=' ', color=' '*team_name=' ' / row=float ; run;
If you look at the data you will see that several of the teams have different numbers of games but we still get an overall total without having to do anything different for different numbers of scores. And a minor change you can get different measures
proc tabulate data=newteams; class sch_id color team_name; var score; table sch_id=' '*score=' '* (sum='Total score ' mean='Average score'), color=' '*team_name=' ' / row=float ; run;
Hi, Ballardw,
The purpose of this request is to enhance the efficiency.
I have datasets similar to this structure, that is sch_id, subgroup (has 10 rows) and annual performance data (say PI1, PI2, PI3).
I m looking for a solution to quickly process the data to have each school has its data of group_year_PI columns by each school_ID. The Macro should save me a lot of other data steps.
Hope this help.
Since you said, "I would like to use the macro, with some modification, and do not want to do a transpose step," here is a solution that just requires a small edit to your macro. Instead of just outputting at the END of the dataset, it will output at the end of each distinct sch_id group.
%macro newvars(dsn);
data _null_;
set &dsn end=end;
count+1;
call symputx('macvar'||left(count),compress(color)||compress(team_name)||"Total");
if end then call symputx('max',count);
run;
data teamscores;
set &dsn;
by sch_id;
%do i = 1 %to &max;
if _n_=&i then do;
&&macvar&i=sum(of game1-game3);
retain &&macvar&i;
keep sch_id &&macvar&i;
end;
%end;
if last.sch_id then output;
run;
%mend newvars;
%newvars(teams)
HI mklangley,
Thank you for taking your time reading my questions and making out a test program. I run your program. It worked. However, it still produce just one line, which is the very last record's sch_id, instead of three lines for three sch_id. (previously I did some trials and always have a output of one line of the last record in the input data set.
I was wondering it has to do with the sch_id column, somehow, I cannot get it done properly.
Thank you. I will keep trying in finding a way out...
data teams;
input sch_id $ color $15. @16 team_name $15. @32 game1 game2 game3;
gametotal= sum(game1, game2, game3);
datalines;
sch1 Green Crickets 10 7 8
sch1 Blue Sea Otters 10 6 7
sch1 Yellow Stingers 9 10 9
sch1 Red Hot Ants 8 9 9
sch1 Purple Cats 9 9 9
sch2 Green Crickets 10 9 7
sch2 Blue Sea Otters 8 7 9
sch2 Yellow Stingers 7 8 10
sch2 Red Hot Ants 9 8 10
sch2 Purple Cats 8 6 9
sch3 Green Crickets 5 7 9
sch3 Blue Sea Otters 6 8 10
sch3 Yellow Stingers 7 9 8
sch3 Red Hot Ants 6 9 10
sch3 Purple Cats 8 10 9
;
run;
proc means data=teams nway missing noprint;
class sch_id color team_name;
var game:;
output out=sum1 sum=;
run;
data sum1;
length colorteam $32.;
set sum1;
colorteam = cats(strip(compress(Color)),strip(compress(team_name)));
run;
proc transpose data=sum1 out=sum2;
by sch_id;
id colorteam;
var gametotal;
run;
Check if this works for you
@Petergao1 Actually, if you run the code below, you should see it produce three rows in the output dataset--one for each sch_id. Are you getting a different result? If so, please post the full code/log that you're are running.
data teams; input sch_id $ color $ team_name & $10. game1 game2 game3; datalines; sch1 Green Crickets 10 7 8 sch1 Blue Sea Otters 10 6 7 sch1 Yellow Stingers 9 10 9 sch1 Red Hot Ants 8 9 9 sch1 Purple Cats 9 9 9 sch2 Green Crickets 10 9 7 sch2 Blue Sea Otters 8 7 9 sch2 Yellow Stingers 7 8 10 sch2 Red Hot Ants 9 8 10 sch2 Purple Cats 8 6 9 sch3 Green Crickets 5 7 9 sch3 Blue Sea Otters 6 8 10 sch3 Yellow Stingers 7 9 8 sch3 Red Hot Ants 6 9 10 sch3 Purple Cats 8 10 9 ; run; %macro newvars(dsn); %macro _; %mend _; data _null_; set &dsn end=end; count+1; call symputx('macvar'||left(count),compress(color)||compress(team_name)||"Total"); if end then call symputx('max',count); run; data teamscores; set &dsn; by sch_id; %do i = 1 %to &max; if _n_=&i then do; &&macvar&i=sum(of game1-game3); retain &&macvar&i; keep sch_id &&macvar&i; end; %end; if last.sch_id then output; run; %mend newvars; %newvars(teams); proc print; run;
My output:
Dear mklangley,
Thanks for your continued attention and help. Today I pasted your codes into my SAS EG, it worked! I got output of four lines, each row is a unique sch_id. The new columns are group_color combined for the sum, This is exactly what I am looking for.
Thank you so much.
Next I will modify my codes according to my real world data structure and apply them to my dataset to see if I can get them out as desired.
Thank you everyone who read and gave your help in many ways.
Best wishes, see you around.
Peter
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.