DATA Step, Macro, Functions and more

How do I summarise this table?

Reply
Contributor
Posts: 36

How do I summarise this table?

I have table A that looks like this, each lline has a unique id, but I havent included it here:

 

year

month

colour

age

sex

total

2014

Jan

red

10

m

1

2014

Jan

red

10

m

1

2014

Jan

yellow

14

F

1

2014

Jan

yellow

14

m

1

2014

Jan

yellow

14

F

1

2014

Jan

Blue

16

F

1

2014

Jan

brown

14

F

1

     

I want to summarise this data by summing up each unique combination of  colour, age and sex  like this:

 

Table b:

year

month

colour

age

sex

total

2014

Jan

red

10

m

2

2014

Jan

yellow

14

f

2

2014

Jan

yellow

14

m

1

2014

Jan

Blue

16

F

1

2014

Jan

brown

14

F

1

 

 Thanks.

Super User
Posts: 19,768

Re: How do I summarise this table?

PROC FREQ

 

List the variables you want in the rows in a Table statement separated by an asterisk.

PROC FREQ, by default, shows the output in the output window, and using the OUT option creates a data set with the information as well.

 

proc freq data=have;
table year*month*colour*age*sex/out=want;
run;

proc print data=want;
run;
Trusted Advisor
Posts: 1,117

Re: How do I summarise this table?

PROC SQL would be another option. It doesn't require a WEIGHT statement to sum up the individual TOTALs and it can preserve the sort order of the observations as far as possible (provided they are sorted by the unique ID in dataset HAVE).

data have;
input id year month $ colour $ age sex $ total;
cards;
1 2014 Jan red 10 m 1
2 2014 Jan red 10 m 1 
3 2014 Jan yellow 14 F 1
4 2014 Jan yellow 14 m 1
5 2014 Jan yellow 14 F 1 
6 2014 Jan Blue 16 F 1 
7 2014 Jan brown 14 F 1
;

proc sql;
create table want(drop=minid) as
select year, month, colour, age, sex, sum(total) as total, min(id) as minid
from have
group by year, month, colour, age, sex
order by minid;
quit;
Contributor
Posts: 36

Re: How do I summarise this table?

Posted in reply to FreelanceReinhard
proc sql;
create table want(drop=minid) as
select year, month, colour, age, sex, sum(total) as total, min(id) as minid
from have
group by year, month, colour, age, sex
order by minid;
quit;

 What does the min after id mean?

 

Thanks.

Trusted Advisor
Posts: 1,117

Re: How do I summarise this table?

[ Edited ]

My intention was to preserve the original sort order of the observations as far as possible (not sure if this is relevant at all for your purposes). As you may have noticed, PROC FREQ automatically sorts the year-month-colour-age-sex combinations by year, month, colour, age, sex, so that, for example, "Apr" would come before "Jan" of the same year etc. I wrote "as far as possible," because your example data already contain a case where some decision needs to be made regarding the sort order of the output dataset: Observations 3 and 5 are to be aggregated. So, the aggregated observation must come either before or after the aggregated observation resulting from original observation no. 4.

 

My suggestion is to use the minimum ID value of each BY group as the sort key. That is, in the example, the aggregated obs. from id=3 and id=5 would be assigned the id 3 (the minimum of 3 and 5) and hence sorted before the aggregated obs. with id=4. Variable MINID contains that minimum, but is dropped (by the DROP= dataset option) after it has served its purpose in the ORDER BY clause.

SAS Super FREQ
Posts: 8,861

Re: How do I summarise this table?

Posted in reply to FreelanceReinhard

Hi:

Both PROC REPORT and PROC TABULATE support ORDER=DATA, so if the data is in some order that you want to maintain, you can use either of those procedures to do what you want.
 
I think there is a problem with your data, though. It is not really ordered. for example, if you look at Jan, yellow, 14 rows -- there are 3 rows - and they are unordered -- you have a row for F, then a row for M, then another row for F. So if you wanted to use a BY with a NOTSORTED, this data would fail, because your groups are not maintaining ORDER.

cynthia

Contributor
Posts: 36

Re: How do I summarise this table?

Posted in reply to FreelanceReinhard

Hi ive tried to use this solution. For now im not worried about the ordering. But im getting this error message:

 

ERROR: The SUM summary function requires a numeric argument

 

ive checked everyithing and all is okay. Is there another code i  can use to get round this problem?

 

Thanks.

Super User
Posts: 11,336

Re: How do I summarise this table?

Does your initial variable TOTAL have values other than 1? With what you show for input it isn't quite clear if your are asking for an actual SUM or a COUNT of rows. If TOTAL is 1 for every record then there's not much difference but if Total has other variables then you may want something like:

 

proc summary data=have nway;

   class year month colour age sex;

   var total;

   output out=want(drop = _: ) sum= ;

run;

 

NOTE: IF Month is character then neither of these approaches will come out in calendar order as the months are not alphabetical.

It may be better to use SAS functions to create a SAS date valued variable to use so sorts work properly.

 

 

Contributor
Posts: 36

Re: How do I summarise this table?

Hi ive tried to use this solution. For now im not worried about the ordering. But im getting this error message:

ERROR: The SUM summary function requires a numeric argument

ive checked everyithing and all is okay. Is there another code i can use to get round this problem?

Thanks.
Super User
Posts: 19,768

Re: How do I summarise this table?

No, if it fails with this it will fail with others.

 

Post a proc contents of your dataset and ensure the variable is numeric in type. Also, post your code its possible you made a mistake somewhere.

 

 

 

Ask a Question
Discussion stats
  • 9 replies
  • 331 views
  • 0 likes
  • 5 in conversation