Hello everyone,
My dataset has 40 variables, for simplicity I show 7 of them below:
branch | state | city | product | model | date | sale |
---|---|---|---|---|---|---|
I need to find the average sale of each product and model at any date, which is in the weekly manner.
I tries the following code:
proc sql;
create table average as
select product, model, date, mean(sale) as sale_ave
from original
group by product, model, date;
quit;
However, I found that this code gives me multiple values as sale_ave for each group of product, model, date.
How can I fix this problem to have the real average value of the sale?
Thanks,
Try instead:
proc sql;
create table average as
select
product,
model,
intnx("WEEK.2",date,0) as week format=weekv5.,
mean(sale) as sale_ave
from original
group by product, model, calculated week;
quit;
WEEK.2 weeks start on Mondays, like the WEEKV. format
PG
The code you have is correct for what you are describing. Try giving an example of what you have as input and what you want as output.
This is part of the output I got:
product | model | date | sale_ave |
MCD100 | MCD100-06 | 08W01 | 0.12 |
MCD100 | MCD100-06 | 08W01 | 0.88 |
MCD100 | MCD100-06 | 08W01 | -0.08 |
MCD100 | MCD100-06 | 08W02 | 0.22 |
MCD100 | MCD100-06 | 08W02 | 0.45 |
As you see for the same amount of product, model, and date I have multiple value as average.
Can you provide some input as well? How big is the dataset and how it sit inputted? It's possible that there are spaces or other characters entered at some point. Try a step like the 'prep' table below and see if this helps.
data have;
infile cards dsd;
informat date mmddyy10.;
format date mmddyy10.;
input product$ model$ date sales;
cards;
1,001,01/01/1990,300
1,001,01/01/1990,100
1,001,01/01/1990,100
1,001,01/02/1990,200
1,001,01/02/1990,300
1,001,01/02/1990,100
2,001,01/01/1990,222
2,001,01/01/1990,333
2,002,01/01/1990,555
2,002,01/01/1990,444
2,003,01/01/1990,999
2,003,01/01/1990,444
3,001,01/01/1990,222
3,002,01/01/1990,777
3,003,01/01/1990,300
3,004,01/01/1990,800
3,005,01/01/1990,500
3,006,01/01/1990,666
;
data prep;
set have;
product = strip(product);
model = strip(model);
date = strip(date);
run;
proc sql;
create table want as
select *,avg(sales) as average
from prep
group by product,model,date;
The dataset is huge, more that 20m observations and 44 vars. Because of that I can't uploaded here. The variables' formats are character, except date',which is date format, and sale's, which is numeric.
Here's your solution, not ideal because of the size of the dataset but this will work. As far as I know you can't do this in one step, you'll have to break it down like this.
data have;
infile cards dsd;
informat date mmddyy10.;
format date weekv5.;
input product$ model$ date sales;
cards;
1,001,01/01/1990,300
1,001,01/02/1990,100
1,001,01/03/1990,100
1,001,01/04/1990,200
1,001,01/05/1990,300
1,001,01/06/1990,100
2,001,01/05/1990,222
2,001,01/06/1990,333
2,002,01/07/1990,555
2,002,01/08/1990,444
2,003,01/09/1990,999
2,003,01/10/1990,444
3,001,01/05/1990,222
3,002,01/06/1990,777
3,003,01/06/1990,300
3,004,01/07/1990,800
3,005,01/08/1990,500
3,006,01/09/1990,666
;
data prep;
set have;
date_c = input(put(date,weekv5.),$5.);
run;
proc sql;
create table want as
select distinct product,model,date,sales,avg(sales) as average
from prep
group by product,model,date_c;
the output of date_c = input(put(date,weekv5.),$5.); is character, but I need it to be numeric to do date functions. Do you know how I can make it?
Keep both the variable you formatted and the created variable.
Or as I initially suggested, use a proc means or proc summary and avoid the extra step entirely.
proc means data=original mean nway;
class product model date;
var sale;
ods output summarytable=want;
run;
Lost in translation. What you really want maybe the average on each date for ALL products and models instead of EACH product and models? otherwise, it hardly makes any sense to me.
Haikuo
no for each product. As I wrote in that code, I want the data to be grouped by product, model, and date. So I can have the average of sales for a specific product and model at a specific date.
Proc SQL does not use formats when calculating GROUP BY. Either create a Week variable or use a proc such as proc means or summary.
You can see this by removing the format from the date variable in your output data set. It will show different days.
I tried to create a week variable. I tried these ways:
data new; set old;
format date weekv5.;
run;
Does the above code change the value of date into weekly intervals?
I also used this:
date_week=put(date, weekv5.);
but the output is in the character format but I need it to be numeric.
Try instead:
proc sql;
create table average as
select
product,
model,
intnx("WEEK.2",date,0) as week format=weekv5.,
mean(sale) as sale_ave
from original
group by product, model, calculated week;
quit;
WEEK.2 weeks start on Mondays, like the WEEKV. format
PG
It seems that with this code the value of 'calculated week' is still in date format but not week, right?
If so, do you know how I can make it to be weekly, since I need to group my observation and find the average in weekly intervals.
Thanks,
Apply a week format to the date variable within the proc means.
proc means data=original mean nway;
class product model date;
var sale;
ods output summarytable=want;
format date weekv5.;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.