Hello,
I've written a query below that gets the customer which generated the maximum revenue or spent the most money. But I would like to get the customer(s) generated the most money in each individual year. The set contains data from 2007 to 2015 and date is in mmddyy10. How would I go about achieving this with proc sql? thanks a lot for the help
proc sql; select Customer_ID, sum from( select Customer_ID, sum(total_retail_Price) as Sum label= "Total Revenue" format= dollar10.2 from testdata group by customer_ID ) having sum= max(sum); quit;
I think with "real" SQL this would be simple because you could use the RANK OVER / PARTITION BY functions that PROC SQL lacks (I think?) - otherwise might be awkward. Here's a data step solution that assigns an integer to each customer ID (which I guessed was a character variable, length $20 - change as appropriate) for the purpose of gathering up data, then back-converts to customer ID at the end.
proc sql noprint;
select count(distinct customer_id) into :ncust from testdata;
select min(year(dt)), max(year(dt)) into :firstyr, :lastyr from testdata;
create table CN as select "cust2num" as fmtname, "C" as type,
customer_id as start, customer_id as end, monotonic() as label
from
(select distinct customer_id from testdata);
create table NC as select "num2cust" as fmtname, "N" as type,
monotonic() as start, monotonic() as end, customer_id as label
from
(select distinct customer_id from testdata);
quit;
proc format cntlin=CN; run;
proc format cntlin=NC; run;
data top_customer_by_year (keep=year customer total);
set testdata END=LAST;
length year 3 customer $20 total 8;
array T {&ncust,&firstyr:&lastyr} _temporary_;
T[put(customer_id,$cust2num.)*1,year(dt)]+total_retail_price;
if last then do;
do year=lbound(T,2) to hbound(T,2);
customer = '';
total = .;
do c=1 to dim(T,1);
if T[c,year]>total then do;
customer=put(c,num2cust.);
total=T[c,year];
end;
end;
output;
end;
end;
run;
can you please provide a sample to work with and your expected output from the sample?
Try this:
proc sql;
select
Customer_ID,
year format=year4.,
sum label= "Total Revenue" format= dollar10.2
from(
select
Customer_ID,
intnx("year", date, 0) as year,
sum(total_retail_Price) as Sum
from testdata
group by customer_ID, calculated year
)
group by year
having sum= max(sum);
quit;
this assumes that your dates are SAS dates. If they are SAS datetimes, replace date with datepart(date)
In the result set, year is the SAS date of the first day of the year, formatted as an integer. If you prefer a simple number variable, use year(year) as year format=4.0.
You just need to use GROUP BY. You might also need to transform your date variable to have just the YEAR value.
Example:
proc sql ;
create table want as
select
year(date) as year
,max(volume) as max_volume
,mean(volume) as mean_volume
from sashelp.stocks
group by 1
;
quit;
This is pretty simple using two PROC SUMMARY calls. The first PROC SUMMARY computes the sum of the sales by customer_id and year; the second takes the results of the first and finds the maximum values per year and the customer_id associated with the maximum value.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.