BookmarkSubscribeRSS Feed
EC27556
Quartz | Level 8

Yesterday I posted this:

 

I currently have a situation where I have:

 

ID           Cost

1               10

1                5

2               10

3               10

3               10

 

What code can I use to convert the above into:

 

ID           Cost

1               15

2               10

3               20

 

And someone gave an answer as this:

 

data have;
    input id cost;
    datalines; 1    10

                     1    5

                     2   10

                     3   10

                     3   10 ;
run;

proc sql;

create table want as

select id, sum(cost) as cost

from have

groupby id;

quit;

 

 

However, what if my data instead looked like this:

 

data have;
    input id cost date;
    datalines; 1    10   Monday

                     1    5    Monday

                     2   10   Tuesday

                     3   10   Wednesday

                     3   10   Wednesday;

 

 

How would you amend the above code to include the date in the output table?

 

proc sql;

create table want as

select id, date, sum(cost) as cost

from have

group by id;

quit;

 

this fails to append the observations so do I instead need to also add date to the by statement? (Bearing in mind that each id always has the same day associated with it)

 

Thanks! 🙂

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Is this a realistic example? Every transaction on Monday, regardless of whether it was Monday this week or Monday last week, or Monday from 3.25 years ago, will get lumped together. Is that what you want? If not, can you give us a more realistic example?

 

Assuming you have more realistic dates, then PROC SUMMARY or PROC MEANS is the way to go here.

 

proc summary nway data=have;
    class id date;
    var cost;
    output out=want sum=;
run;
--
Paige Miller
EC27556
Quartz | Level 8
Thanks, I'm curious to know why you would recommend proc means/summary over sql though!

The example wasn't very logical, I admit! I am essentially considering a situation where you have lots of repeating observations which are almost identical except for cost value. I.e. if you have id=1 for 3 observations in the dataset, all the variable values will be the same (e.g. height, weight) apart from cost. I am looking for code which enables to turn those 3 observations into 1 observation with a summed cost variable!
ballardw
Super User

@EC27556 wrote:
Thanks, I'm curious to know why you would recommend proc means/summary over sql though!

Means/Summary can be faster than SQL.

Summary can produce groups of summaries of 1) over the whole data set, 2) the id values only,  3) the dates only and 4) the Id*date combinations. With one call to the procedure.

 

Summary allows use of features like variable lists. Consider a data set with  100 variables. In Proc SQL you have to list every single variable to get the summary. You could sum all the numeric variables by using: Var _numeric_. Or lists of common based names like var_: using all variables whose names start with Var_, or the -- (two dashes) that get contiguous variables, or iterated list like var1 - var25 to summarize 25 variables with names like that.

 

The AUTONAME feature means that if you request multiple statistics for variables then the output variables are named with the statistic as a suffix. SQL requires setting a specific name for each.

And the syntax for mixing which variables get which statistics when they are not all getting the same statistic is again way shorter than Proc SQL:   mean(var1-var10) =    stddev(var5-var15) = , for example instead of 10 Mean(variable) and a different 10 stddev(variable).

And I'm not sure if SQL does percentiles.

 

Proc means/summary can use multilabel formats for variables, Proc SQL can't.

Some other options like COMPLETETYPES can create output that Proc SQL won't without additional programming.

Check the syntax for TYPES, ID, and WAYS for other output that would require lots of Proc SQL programming to accomplish.

 

It is a whole lot easier to apply Weights to statistics in Proc Mean/summary.

 

And more.

 

PaigeMiller
Diamond | Level 26

I think @Reeza mentioned this in the other thread. PROC SUMMARY/PROC MEANS is much better if you have many variables to sum, or many statistics to produce, or many different ways to slice the data, or any combination of these. In this very simple example, which covers only a small percent of real world applications, PROC SQL is fine. In other less simple examples, PROC SUMMARY/PROC MEANS will get you there with less typing, and probably less execution time as well. Keep that in mind as your SAS use continues and you find more complicated problems for which you have to slice and dice the data in several ways, and compute one or multiple statistics on one or multiple variables.


And, @EC27556 , as you know there is a phrase in the English language "Please listen to reason", but here in the SAS Communities, it is "Please listen to Reeza". And please listen to @ballardw as well.

--
Paige Miller
novinosrin
Tourmaline | Level 20
group by id,date;
EC27556
Quartz | Level 8
Great, thanks! And so, generally, if you have repeating observations that are identical in terms of variable values other than the cost variable, you would simply keep adding the other variable names to the select and group by statements then? does the order matter? I assume ID always has to be first in the by statement but otherwise no as long as ID is unique?

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 957 views
  • 4 likes
  • 5 in conversation