BookmarkSubscribeRSS Feed
pawandh
Fluorite | Level 6

store  sales

1 100

1 110

1 200

2 200

2 300

2 100

i want cumulative sales of store with proc sql 

store sales

1 100

1 210

1 410

2 200

2 500

2 600

how cant i get this result using proc sql?

6 REPLIES 6
LinusH
Tourmaline | Level 20
This not ideal task for SQL, much simpler using a data step.
But I don't think that storing cumulative values is a great idea. More of a task for a report. But that's perhaps the use of this?
Data never sleeps
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Generally speaking, its better to create a new variable with calculations rather than overriding the original data.  Also there is no order givein in that data you provide, i.e. how do you know that 100 comes before 110 and 200?   The calculation changes depending on observation order in the data - which is not a good idea as the calculation performed again may not get the same response, say next time you look at the data it looks like:

1 100

1 105

1 110

1 200

 

Then your cumulative does not match, observation position is not a good grouping.

As for your calculation, as @LinusH has said, your better off in a datastep, why do you not want to use that?

data want;
  set have;
  retain cum_sum;
  by store;
  if first.store then cum_sum=sales;
  else cum_sum=sum(cum_sales,sales);
run;
FreelanceReinh
Jade | Level 19

I agree with @LinusH and @RW9 that the data step is much more appropriate for this task, but if you really want to use PROC SQL, try this:

proc sql;
create table temp as
select *, monotonic() as n
from have
order by store, n;

create table want as
select a.store, sum(b.sales) as sales
from temp a join temp b
on a.store=b.store & b.n<=a.n
group by a.store, a.n
order by store, a.n;
quit;

Edit: Given your strong preference for PROC SQL, you probably have a "database mindset." In this case, however, you will have a record identifier in your real data. So, the creation of table TEMP (using the undocumented MONOTONIC() function, which I personally would avoid for production purposes) would be unnecessary, because something like variable N would exist already in your HAVE dataset.

 

Edit 2: Added the ORDER BY clause to the first CREATE statement, so as to make it less probable that PROC SQL permutes observations.

pawandh
Fluorite | Level 6

if instead of sum(b.sales), i took (a.sales) then y my output is different.And "a.n<=b.n" how is this working.

Please explain how the query is working

 

 

 


proc sql;
select a.store,sum(a.sales)as sale from test a,test b where a.store=b.store and a.n<=b.n group by b.store,b.n;

 

store sale

1 100

1 210

1 210

1 410

1 410

1 410

2 200

2 410

2 410

...

getting this output from the above query.

 

Please explain

FreelanceReinh
Jade | Level 19

I think, the last two 410s should read 500 in your "output."

 

That said, your code is almost correct: If you select b.store instead of a.store, the unwanted duplicate records will not occur.

 

The reason is that, although by your WHERE condition a.store and b.store contain equal values, PROC SQL regards a.store and b.store as two separate columns. The GROUP BY clause requests a consolidation of the "b.store-b.n BY groups" within the subset of the Cartesian product which was created by selecting from "test a, test b."

 

As a consequence, selecting b.store means selecting the single distinct value that b.store has in the respective BY group. Selecting a.store, however, is interpreted as the request to combine the aggregated values from the BY groups with values (namely of a.store) from original data (in the Cartesian product subset). This is documented in the log by the note:

NOTE: The query requires remerging summary statistics back with the original data.

So, the nicely cumulated sales values (100, 210, 410, ...) are matched to the a.store values from the BY groups: The first value occurs once, because for b.n=1 there is only one a.n value satisfying the WHERE condition a.n<=b.n (namely a.n=1). (Hence, the said BY group consists of a single observation.) The second value occurs twice, because for b.n=2 there are two a.n values satisfying a.n<=b.n: a.n=1 and a.n=2. (BY group has two obs.)  And so on.

LinusH
Tourmaline | Level 20

I'm changing the subject to better describe the question.

Data never sleeps

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 5675 views
  • 0 likes
  • 4 in conversation