Hi All,
I have data that looks like this:
misses | absents | goals | summer | autumn | winter | spring |
10 | 1 | 1 | mike | sharon | peter | john |
3 | 0 | 2 | sharon | mike | sharon | dan |
5 | 6 | 9 | sharon | peter | peter | sharon |
4 | 9 | 2 | peter | sharon | john | micheal |
3 | 2 | 8 | mike | peter | peter | dan |
and i'm trying to form multiple tables (for each season) whilst grouping(summing over the first 3 columns) the data.
[I want to run the same code over misses --- goals for each season)
i.e. I want the final results to look like this:
sum_misses | sum_ absents | sum_goals | autumn |
0 | 0 | 0 | mike |
14 | 10 | 3 | sharon |
8 | 8 | 17 | peter |
sum_misses | sum_ absents | sum_goals | winter |
0 | 0 | 0 | mike |
3 | 0 | 2 | sharon |
18 | 9 | 18 | peter |
sum_misses | sum_ absents | sum_goals | spring |
4 | 9 | 2 | mike |
6 | 2 | 10 | sharon |
15 | 7 | 10 | peter |
sum_misses | sum_ absents | sum_goals | summer |
13 | 3 | 9 | mike |
8 | 6 | 11 | sharon |
4 | 9 | 2 | peter |
Please assist;
Hi @Solly7
Here is an approach to achieve this. Hope this helps!
Best,
data have;
infile datalines dlm="09"x;
input misses absents goals summer $ autumn $ winter $ spring $;
datalines;
10 1 1 mike sharon peter john
3 0 2 sharon mike sharon dan
5 6 9 sharon peter peter sharon
4 9 2 peter sharon john micheal
3 2 8 mike peter peter dan
;
run;
%macro mymac(season);
proc sql;
create table &season as
select sum(misses) as sum_misses,
sum(absents) as sum_absents,
sum(goals) as sum_goals,
&season
from have
group by &season;
run;
%mend;
%mymac(spring)
%mymac(summer)
%mymac(autumn)
%mymac(winter)
I'll point out that this "wide" organization of data, with time (in this case seasons) as variable names, causes this problem that you need to write some sort of looping or macro to get the code to do what you want. If you produce a "long" data set, where each person's information for each season is on one line, then a BY variable or a CLASS variable in PROC SUMMARY makes the programming much easier.
What I'm talking about is data organized like this:
misses absents goals season person 10 1 1 summer mike 10 1 1 autumn shannon
Hi @Solly7 The objective is rather straight forward in my opinion albeit I am hesitant about the splitting of datasets
data have;
infile datalines;
input misses absents goals summer $ autumn $ winter $ spring $;
datalines;
10 1 1 mike sharon peter john
3 0 2 sharon mike sharon dan
5 6 9 sharon peter peter sharon
4 9 2 peter sharon john micheal
3 2 8 mike peter peter dan
;
run;
data temp;
set have;
array t summer--spring;
do over t;
season=put(vname(t),$32.);
v=t;
output;
end;
drop summer--spring;
run;
proc summary data=temp nway;
class season v;
var misses--goals;
output out=want(drop=_:) sum=/autoname;
run;
/*Or perhaps you need completetypes and stdize the missing to zero*/
proc summary data=temp nway completetypes;
class season v;
var misses--goals;
output out=want(drop=_:) sum=/autoname;
run;
proc stdize data=want out=final_want reponly missing=0;
var misses_sum--goals_sum;
run;
I haven't split the Final_want above into datasets for each season (as most advise against it). I will leave that up-to you to decide. I would suggest you could wait for advise on that regard from wise people like @PaigeMiller / @Reeza / @ballardw and take a decision to proceed.
Kind Regards!
@novinosrin solution is correct 🙂
The only addition to his answer, is if you want it split out that way once it's split you could just use PROC TABULATE.
It depends on what you're doing next. If you're doing further processing use the solution from @novinosrin. If you're trying to display a report in Excel, PDF or Word, then using PROC TABULATE or REPORT or PRINT could be better solutions.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.