BookmarkSubscribeRSS Feed
zetter
Calcite | Level 5

I have table A that looks like this, each lline has a unique id, but I havent included it here:

 

year

month

colour

age

sex

total

2014

Jan

red

10

m

1

2014

Jan

red

10

m

1

2014

Jan

yellow

14

F

1

2014

Jan

yellow

14

m

1

2014

Jan

yellow

14

F

1

2014

Jan

Blue

16

F

1

2014

Jan

brown

14

F

1

     

I want to summarise this data by summing up each unique combination of  colour, age and sex  like this:

 

Table b:

year

month

colour

age

sex

total

2014

Jan

red

10

m

2

2014

Jan

yellow

14

f

2

2014

Jan

yellow

14

m

1

2014

Jan

Blue

16

F

1

2014

Jan

brown

14

F

1

 

 Thanks.

9 REPLIES 9
Reeza
Super User

PROC FREQ

 

List the variables you want in the rows in a Table statement separated by an asterisk.

PROC FREQ, by default, shows the output in the output window, and using the OUT option creates a data set with the information as well.

 

proc freq data=have;
table year*month*colour*age*sex/out=want;
run;

proc print data=want;
run;
FreelanceReinh
Jade | Level 19

PROC SQL would be another option. It doesn't require a WEIGHT statement to sum up the individual TOTALs and it can preserve the sort order of the observations as far as possible (provided they are sorted by the unique ID in dataset HAVE).

data have;
input id year month $ colour $ age sex $ total;
cards;
1 2014 Jan red 10 m 1
2 2014 Jan red 10 m 1 
3 2014 Jan yellow 14 F 1
4 2014 Jan yellow 14 m 1
5 2014 Jan yellow 14 F 1 
6 2014 Jan Blue 16 F 1 
7 2014 Jan brown 14 F 1
;

proc sql;
create table want(drop=minid) as
select year, month, colour, age, sex, sum(total) as total, min(id) as minid
from have
group by year, month, colour, age, sex
order by minid;
quit;
zetter
Calcite | Level 5
proc sql;
create table want(drop=minid) as
select year, month, colour, age, sex, sum(total) as total, min(id) as minid
from have
group by year, month, colour, age, sex
order by minid;
quit;

 What does the min after id mean?

 

Thanks.

FreelanceReinh
Jade | Level 19

My intention was to preserve the original sort order of the observations as far as possible (not sure if this is relevant at all for your purposes). As you may have noticed, PROC FREQ automatically sorts the year-month-colour-age-sex combinations by year, month, colour, age, sex, so that, for example, "Apr" would come before "Jan" of the same year etc. I wrote "as far as possible," because your example data already contain a case where some decision needs to be made regarding the sort order of the output dataset: Observations 3 and 5 are to be aggregated. So, the aggregated observation must come either before or after the aggregated observation resulting from original observation no. 4.

 

My suggestion is to use the minimum ID value of each BY group as the sort key. That is, in the example, the aggregated obs. from id=3 and id=5 would be assigned the id 3 (the minimum of 3 and 5) and hence sorted before the aggregated obs. with id=4. Variable MINID contains that minimum, but is dropped (by the DROP= dataset option) after it has served its purpose in the ORDER BY clause.

Cynthia_sas
SAS Super FREQ

Hi:

Both PROC REPORT and PROC TABULATE support ORDER=DATA, so if the data is in some order that you want to maintain, you can use either of those procedures to do what you want.
 
I think there is a problem with your data, though. It is not really ordered. for example, if you look at Jan, yellow, 14 rows -- there are 3 rows - and they are unordered -- you have a row for F, then a row for M, then another row for F. So if you wanted to use a BY with a NOTSORTED, this data would fail, because your groups are not maintaining ORDER.

cynthia

zetter
Calcite | Level 5

Hi ive tried to use this solution. For now im not worried about the ordering. But im getting this error message:

 

ERROR: The SUM summary function requires a numeric argument

 

ive checked everyithing and all is okay. Is there another code i  can use to get round this problem?

 

Thanks.

ballardw
Super User

Does your initial variable TOTAL have values other than 1? With what you show for input it isn't quite clear if your are asking for an actual SUM or a COUNT of rows. If TOTAL is 1 for every record then there's not much difference but if Total has other variables then you may want something like:

 

proc summary data=have nway;

   class year month colour age sex;

   var total;

   output out=want(drop = _: ) sum= ;

run;

 

NOTE: IF Month is character then neither of these approaches will come out in calendar order as the months are not alphabetical.

It may be better to use SAS functions to create a SAS date valued variable to use so sorts work properly.

 

 

zetter
Calcite | Level 5
Hi ive tried to use this solution. For now im not worried about the ordering. But im getting this error message:

ERROR: The SUM summary function requires a numeric argument

ive checked everyithing and all is okay. Is there another code i can use to get round this problem?

Thanks.
Reeza
Super User

No, if it fails with this it will fail with others.

 

Post a proc contents of your dataset and ensure the variable is numeric in type. Also, post your code its possible you made a mistake somewhere.

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1264 views
  • 0 likes
  • 5 in conversation