How to merge appended values (continued from yesterday)

EC27556 · Posted 10-16-2020 12:55 PM

Yesterday I posted this:

I currently have a situation where I have:

ID           Cost
1               10
1                5
2               10
3               10
3               10

What code can I use to convert the above into:

ID           Cost
1               15
2               10
3               20

And someone gave an answer as this:

data have;
    input id cost;
    datalines; 1    10
                     1    5
                     2   10
                    3   10
                    3   10 ;
run;

proc sql;
create table want as
select id, sum(cost) as cost
from have
groupby id;
quit;

However, what if my data instead looked like this:

data have;
input id cost date;
datalines; 1 10 Monday

1 5 Monday

2 10 Tuesday

3 10 Wednesday

3 10 Wednesday;

How would you amend the above code to include the date in the output table?

proc sql;

create table want as

select id, date, sum(cost) as cost

from have

group by id;

quit;

this fails to append the observations so do I instead need to also add date to the by statement? (Bearing in mind that each id always has the same day associated with it)

Thanks! 🙂

PaigeMiller · Posted 10-16-2020 01:01 PM

Is this a realistic example? Every transaction on Monday, regardless of whether it was Monday this week or Monday last week, or Monday from 3.25 years ago, will get lumped together. Is that what you want? If not, can you give us a more realistic example?

Assuming you have more realistic dates, then PROC SUMMARY or PROC MEANS is the way to go here.

proc summary nway data=have;
    class id date;
    var cost;
    output out=want sum=;
run;

--
Paige Miller

EC27556 · Posted 10-16-2020 01:26 PM

Thanks, I'm curious to know why you would recommend proc means/summary over sql though!

The example wasn't very logical, I admit! I am essentially considering a situation where you have lots of repeating observations which are almost identical except for cost value. I.e. if you have id=1 for 3 observations in the dataset, all the variable values will be the same (e.g. height, weight) apart from cost. I am looking for code which enables to turn those 3 observations into 1 observation with a summed cost variable!

ballardw · Posted 10-16-2020 01:53 PM

@EC27556 wrote:
Thanks, I'm curious to know why you would recommend proc means/summary over sql though!

Means/Summary can be faster than SQL.

Summary can produce groups of summaries of 1) over the whole data set, 2) the id values only, 3) the dates only and 4) the Id*date combinations. With one call to the procedure.

Summary allows use of features like variable lists. Consider a data set with 100 variables. In Proc SQL you have to list every single variable to get the summary. You could sum all the numeric variables by using: Var _numeric_. Or lists of common based names like var_: using all variables whose names start with Var_, or the -- (two dashes) that get contiguous variables, or iterated list like var1 - var25 to summarize 25 variables with names like that.

The AUTONAME feature means that if you request multiple statistics for variables then the output variables are named with the statistic as a suffix. SQL requires setting a specific name for each.

And the syntax for mixing which variables get which statistics when they are not all getting the same statistic is again way shorter than Proc SQL: mean(var1-var10) = stddev(var5-var15) = , for example instead of 10 Mean(variable) and a different 10 stddev(variable).

And I'm not sure if SQL does percentiles.

Proc means/summary can use multilabel formats for variables, Proc SQL can't.

Some other options like COMPLETETYPES can create output that Proc SQL won't without additional programming.

Check the syntax for TYPES, ID, and WAYS for other output that would require lots of Proc SQL programming to accomplish.

It is a whole lot easier to apply Weights to statistics in Proc Mean/summary.

And more.

PaigeMiller · Posted 10-16-2020 03:30 PM

I think @Reeza mentioned this in the other thread. PROC SUMMARY/PROC MEANS is much better if you have many variables to sum, or many statistics to produce, or many different ways to slice the data, or any combination of these. In this very simple example, which covers only a small percent of real world applications, PROC SQL is fine. In other less simple examples, PROC SUMMARY/PROC MEANS will get you there with less typing, and probably less execution time as well. Keep that in mind as your SAS use continues and you find more complicated problems for which you have to slice and dice the data in several ways, and compute one or multiple statistics on one or multiple variables.

And, @EC27556 , as you know there is a phrase in the English language "Please listen to reason", but here in the SAS Communities, it is "Please listen to Reeza". And please listen to @ballardw as well.

--
Paige Miller

novinosrin · Posted 10-16-2020 01:05 PM

group by id,date;

Kurt_Bremser · Posted 10-16-2020 01:06 PM

Add the new group:

proc sql;
create table want as
select id, date, sum(cost) as cost
from have
group by id, date;
quit;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

EC27556 · Posted 10-16-2020 01:31 PM

Great, thanks! And so, generally, if you have repeating observations that are identical in terms of variable values other than the cost variable, you would simply keep adding the other variable names to the select and group by statements then? does the order matter? I assume ID always has to be first in the by statement but otherwise no as long as ID is unique?

How to merge appended values (continued from yesterday)

Re: How to merge appended values (continued from yesterday)

Re: How to merge appended values (continued from yesterday)

Re: How to merge appended values (continued from yesterday)

Re: How to merge appended values (continued from yesterday)

Re: How to merge appended values (continued from yesterday)

Re: How to merge appended values (continued from yesterday)

Re: How to merge appended values (continued from yesterday)

Classroom Training Available!