I have table A that looks like this, each lline has a unique id, but I havent included it here:
year |
month |
colour |
age |
sex |
total |
2014 |
Jan |
red |
10 |
m |
1 |
2014 |
Jan |
red |
10 |
m |
1 |
2014 |
Jan |
yellow |
14 |
F |
1 |
2014 |
Jan |
yellow |
14 |
m |
1 |
2014 |
Jan |
yellow |
14 |
F |
1 |
2014 |
Jan |
Blue |
16 |
F |
1 |
2014 |
Jan |
brown |
14 |
F |
1 |
I want to summarise this data by summing up each unique combination of colour, age and sex like this:
Table b:
year |
month |
colour |
age |
sex |
total |
2014 |
Jan |
red |
10 |
m |
2 |
2014 |
Jan |
yellow |
14 |
f |
2 |
2014 |
Jan |
yellow |
14 |
m |
1 |
2014 |
Jan |
Blue |
16 |
F |
1 |
2014 |
Jan |
brown |
14 |
F |
1 |
Thanks.
PROC FREQ
List the variables you want in the rows in a Table statement separated by an asterisk.
PROC FREQ, by default, shows the output in the output window, and using the OUT option creates a data set with the information as well.
proc freq data=have;
table year*month*colour*age*sex/out=want;
run;
proc print data=want;
run;
PROC SQL would be another option. It doesn't require a WEIGHT statement to sum up the individual TOTALs and it can preserve the sort order of the observations as far as possible (provided they are sorted by the unique ID in dataset HAVE).
data have;
input id year month $ colour $ age sex $ total;
cards;
1 2014 Jan red 10 m 1
2 2014 Jan red 10 m 1
3 2014 Jan yellow 14 F 1
4 2014 Jan yellow 14 m 1
5 2014 Jan yellow 14 F 1
6 2014 Jan Blue 16 F 1
7 2014 Jan brown 14 F 1
;
proc sql;
create table want(drop=minid) as
select year, month, colour, age, sex, sum(total) as total, min(id) as minid
from have
group by year, month, colour, age, sex
order by minid;
quit;
proc sql;
create table want(drop=minid) as
select year, month, colour, age, sex, sum(total) as total, min(id) as minid
from have
group by year, month, colour, age, sex
order by minid;
quit;
What does the min after id mean?
Thanks.
My intention was to preserve the original sort order of the observations as far as possible (not sure if this is relevant at all for your purposes). As you may have noticed, PROC FREQ automatically sorts the year-month-colour-age-sex combinations by year, month, colour, age, sex, so that, for example, "Apr" would come before "Jan" of the same year etc. I wrote "as far as possible," because your example data already contain a case where some decision needs to be made regarding the sort order of the output dataset: Observations 3 and 5 are to be aggregated. So, the aggregated observation must come either before or after the aggregated observation resulting from original observation no. 4.
My suggestion is to use the minimum ID value of each BY group as the sort key. That is, in the example, the aggregated obs. from id=3 and id=5 would be assigned the id 3 (the minimum of 3 and 5) and hence sorted before the aggregated obs. with id=4. Variable MINID contains that minimum, but is dropped (by the DROP= dataset option) after it has served its purpose in the ORDER BY clause.
Hi:
Both PROC REPORT and PROC TABULATE support ORDER=DATA, so if the data is in some order that you want to maintain, you can use either of those procedures to do what you want.
I think there is a problem with your data, though. It is not really ordered. for example, if you look at Jan, yellow, 14 rows -- there are 3 rows - and they are unordered -- you have a row for F, then a row for M, then another row for F. So if you wanted to use a BY with a NOTSORTED, this data would fail, because your groups are not maintaining ORDER.
cynthia
Hi ive tried to use this solution. For now im not worried about the ordering. But im getting this error message:
ERROR: The SUM summary function requires a numeric argument
ive checked everyithing and all is okay. Is there another code i can use to get round this problem?
Thanks.
Does your initial variable TOTAL have values other than 1? With what you show for input it isn't quite clear if your are asking for an actual SUM or a COUNT of rows. If TOTAL is 1 for every record then there's not much difference but if Total has other variables then you may want something like:
proc summary data=have nway;
class year month colour age sex;
var total;
output out=want(drop = _: ) sum= ;
run;
NOTE: IF Month is character then neither of these approaches will come out in calendar order as the months are not alphabetical.
It may be better to use SAS functions to create a SAS date valued variable to use so sorts work properly.
No, if it fails with this it will fail with others.
Post a proc contents of your dataset and ensure the variable is numeric in type. Also, post your code its possible you made a mistake somewhere.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.