Hello everyone,
I have a simple summary table that shows the number of assignments per quarter. It looks something like this:
Type | OCT23 | NOV23 | DEC23 |
AdHoc | 10 | 9 | 8 |
Report | 5 | 7 | 3 |
Survey | 1 | ||
Modification | 3 | 5 | 4 |
I want to develop a "Total" column that gives the sum of each assignment type for the entire quarter, so something like this:
Type | OCT23 | NOV23 | DEC23 | Total |
AdHoc | 10 | 9 | 8 | 27 |
Report | 5 | 7 | 3 | 15 |
Survey | 1 | 1 | ||
Modification | 3 | 5 | 4 | 12 |
A note, the three month columns (OCT23, NOV23, DEC23) will change each quarter when this report is ran. So it would be nice to have a solution that accounts for this. I was reviewing this post: https://communities.sas.com/t5/SAS-Procedures/How-to-sum-selected-array-variables/td-p/129788
Their dataset is structured differently than mine, would I also need to restructure it similarly?
Here are some datalines that output a summary table of what I have so far.
data have; length Type $12; input Type OCT23 NOV23 DEC23; datalines; AdHoc 10 9 8 Report 5 7 3 Survey 1 0 0 Modification 3 5 4 ; run;
Thanks in advance!
The answer marked correct clearly works on your example data set, @mreynaud , but there are conditions that have to hold for this solution to work on your real data, which you need to be aware of.
If your real world data set has numeric values that are not to be included in the sum (such as a numeric ID value, or a loan balance, or a crop yield, or any other variable that is numeric), then this solution does not work. The solution presented using a long data set does not have this drawback, and as stated has many other advantages as well.
While you are absolutely correct about this, I proposed a solution based on the question as presented. Actually, I fully expected that the original poster would write back saying that the solution didn't work (because of the presence of other numeric variables). The lesson (in my mind) was going to be that you need to accurately describe the data if you want an accurate solution. I thought recommendations to restructure the data and then learn a summary procedure were too complex for a programmer asking the original question.
If another variable is present, the fix might be any one of a number of things that use the current structure to the data. For example, if the data added a numeric variable YEAR, it might be:
total = sum(of _numeric_) - year;
Or even (if there are 100 numeric variables (beginning with YEAR) added after the "month" variables:
total = sum(of type - numeric - year) - year;
SAS won't care that TYPE is character, it would simply omit TYPE from the specified list.
And if the other numeric variables are scattered throughout the data as possibility would be:
total = sum(of jan2:, of feb2:, of mar2:, of apr2:, of may2:, of jun2:, of jul2:, of aug2:, of sep2:, of oct2:, of nov2:, of dec2:);
SAS should complain if not all of these variables exist, but it should compute properly anyway. Yes, this solution only lasts through 2029, but by that time all involved should be wiser and able to devise something better.
For the sophisticated, you could even try:
data want;
set have (keep= jan2: feb2: mar2: apr2: may2: jun2: jul2: aug2: sep2: oct2: nov2: dec2:);
type = sum(of _numeric_);
set have;
run;
My bad for not mentioning that the inclusion of other numeric variables would throw off the total. I did fully expect that the issue would arise during the process of ironing out a solution.
Typically, this wide layout of your data is a poor layout. Avoid having calendar information in a variable name. A long layout of data, where the calendar information is a value of a variable named MONTH. Most analyses in SAS PROCs require the long layout rather than the wide layout.
In this case, it probably doesn't matter which layout you use, wide or long, but in many other cases, wide is a poor choice, and long will make your programming easier; and so using a long layout is a good habit to get into.
Solution for wide data:
data want;
set have;
total = sum(oct23,nov23,dec23);
run;
Pretty simple, except you have to type all three month names. And if the time period changes, you have to re-type the month names.
LONG data set.
data work.LONG;
infile datalines dsd truncover;
input Type:$12. month:MONYY7. value:32.;
format month MONYY7.;
datalines4;
AdHoc,OCT2023,10
AdHoc,NOV2023,9
AdHoc,DEC2023,8
Report,OCT2023,5
Report,NOV2023,7
Report,DEC2023,3
Survey,OCT2023,1
Survey,NOV2023,0
Survey,DEC2023,0
Modification,OCT2023,3
Modification,NOV2023,5
Modification,DEC2023,4
;;;;
Solution for long data set
proc summary data=long nway;
var value;
class month;
output out=want sum=total;
run;
Also, pretty simple, but note that if you use the long data set, your code doesn't have to change next month to handle different months. So in the long run, the long data set will take less time for you to program
A very powerful tool in SAS is the ability to create report groups based on the Formatted values of a variable. To extend @PaigeMiller's example to a longer data set and see some options using the different formats for the Month variable:
data work.LONG2; infile datalines dsd truncover; input Type:$12. month:MONYY7. value:32.; format month MONYY7.; datalines4; AdHoc,OCT2023,10 AdHoc,NOV2023,9 AdHoc,DEC2023,8 Report,OCT2023,5 Report,NOV2023,7 Report,DEC2023,3 Survey,OCT2023,1 Survey,NOV2023,0 Survey,DEC2023,0 Modification,OCT2023,3 Modification,NOV2023,5 Modification,DEC2023,4 AdHoc,SEP2023,18 AdHoc,AUG2023,19 AdHoc,JUL2023,11 Report,SEP2023,1 Report,AUG2023,4 Report,JUL2023,8 Survey,SEP2023,3 Survey,AUG2023,2 Survey,JUL2023,1 Modification,SEP2023,10 Modification,AUG2023,13 Modification,JUL2023,12 ;;;; proc tabulate data=work.long2; class type month; var value; table type, (Month='' All='All months')*value=''*sum='' ; run; proc tabulate data=work.long2; class type month; format month yyq.; var value; table type, (Month='' All='All Quarters')*value=''*sum='' ; run;
See the difference just changing the format makes? You can get quite interesting results with actual dates such as weekly, monthly, quarterly or annual summaries if you have actual dates of record just by changing the format. With formats like Month and Qtr you can even combine values across years if that is desirable. Just by changing the format.
Plus with some additional work you can create custom formats to indicate values based on a fiscal year or even different intervals such a perhaps reporting the recent values weekly and then for older values use month, quarter or yearly, sort of a "year to date" difference. All without changing a single value in the actual data set.
The groups created by formats are also used in analysis procedures and generally for graphing (some restrictions on custom date/time/datetime formats may apply).
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.