- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone,
My dataset has 40 variables, for simplicity I show 7 of them below:
branch | state | city | product | model | date | sale |
---|---|---|---|---|---|---|
I need to find the average sale of each product and model at any date, which is in the weekly manner.
I tries the following code:
proc sql;
create table average as
select product, model, date, mean(sale) as sale_ave
from original
group by product, model, date;
quit;
However, I found that this code gives me multiple values as sale_ave for each group of product, model, date.
How can I fix this problem to have the real average value of the sale?
Thanks,
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try instead:
proc sql;
create table average as
select
product,
model,
intnx("WEEK.2",date,0) as week format=weekv5.,
mean(sale) as sale_ave
from original
group by product, model, calculated week;
quit;
WEEK.2 weeks start on Mondays, like the WEEKV. format
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The code you have is correct for what you are describing. Try giving an example of what you have as input and what you want as output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is part of the output I got:
product | model | date | sale_ave |
MCD100 | MCD100-06 | 08W01 | 0.12 |
MCD100 | MCD100-06 | 08W01 | 0.88 |
MCD100 | MCD100-06 | 08W01 | -0.08 |
MCD100 | MCD100-06 | 08W02 | 0.22 |
MCD100 | MCD100-06 | 08W02 | 0.45 |
As you see for the same amount of product, model, and date I have multiple value as average.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you provide some input as well? How big is the dataset and how it sit inputted? It's possible that there are spaces or other characters entered at some point. Try a step like the 'prep' table below and see if this helps.
data have;
infile cards dsd;
informat date mmddyy10.;
format date mmddyy10.;
input product$ model$ date sales;
cards;
1,001,01/01/1990,300
1,001,01/01/1990,100
1,001,01/01/1990,100
1,001,01/02/1990,200
1,001,01/02/1990,300
1,001,01/02/1990,100
2,001,01/01/1990,222
2,001,01/01/1990,333
2,002,01/01/1990,555
2,002,01/01/1990,444
2,003,01/01/1990,999
2,003,01/01/1990,444
3,001,01/01/1990,222
3,002,01/01/1990,777
3,003,01/01/1990,300
3,004,01/01/1990,800
3,005,01/01/1990,500
3,006,01/01/1990,666
;
data prep;
set have;
product = strip(product);
model = strip(model);
date = strip(date);
run;
proc sql;
create table want as
select *,avg(sales) as average
from prep
group by product,model,date;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The dataset is huge, more that 20m observations and 44 vars. Because of that I can't uploaded here. The variables' formats are character, except date',which is date format, and sale's, which is numeric.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here's your solution, not ideal because of the size of the dataset but this will work. As far as I know you can't do this in one step, you'll have to break it down like this.
data have;
infile cards dsd;
informat date mmddyy10.;
format date weekv5.;
input product$ model$ date sales;
cards;
1,001,01/01/1990,300
1,001,01/02/1990,100
1,001,01/03/1990,100
1,001,01/04/1990,200
1,001,01/05/1990,300
1,001,01/06/1990,100
2,001,01/05/1990,222
2,001,01/06/1990,333
2,002,01/07/1990,555
2,002,01/08/1990,444
2,003,01/09/1990,999
2,003,01/10/1990,444
3,001,01/05/1990,222
3,002,01/06/1990,777
3,003,01/06/1990,300
3,004,01/07/1990,800
3,005,01/08/1990,500
3,006,01/09/1990,666
;
data prep;
set have;
date_c = input(put(date,weekv5.),$5.);
run;
proc sql;
create table want as
select distinct product,model,date,sales,avg(sales) as average
from prep
group by product,model,date_c;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
the output of date_c = input(put(date,weekv5.),$5.); is character, but I need it to be numeric to do date functions. Do you know how I can make it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Keep both the variable you formatted and the created variable.
Or as I initially suggested, use a proc means or proc summary and avoid the extra step entirely.
proc means data=original mean nway;
class product model date;
var sale;
ods output summarytable=want;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Lost in translation. What you really want maybe the average on each date for ALL products and models instead of EACH product and models? otherwise, it hardly makes any sense to me.
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
no for each product. As I wrote in that code, I want the data to be grouped by product, model, and date. So I can have the average of sales for a specific product and model at a specific date.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Proc SQL does not use formats when calculating GROUP BY. Either create a Week variable or use a proc such as proc means or summary.
You can see this by removing the format from the date variable in your output data set. It will show different days.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I tried to create a week variable. I tried these ways:
data new; set old;
format date weekv5.;
run;
Does the above code change the value of date into weekly intervals?
I also used this:
date_week=put(date, weekv5.);
but the output is in the character format but I need it to be numeric.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try instead:
proc sql;
create table average as
select
product,
model,
intnx("WEEK.2",date,0) as week format=weekv5.,
mean(sale) as sale_ave
from original
group by product, model, calculated week;
quit;
WEEK.2 weeks start on Mondays, like the WEEKV. format
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It seems that with this code the value of 'calculated week' is still in date format but not week, right?
If so, do you know how I can make it to be weekly, since I need to group my observation and find the average in weekly intervals.
Thanks,
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Apply a week format to the date variable within the proc means.
proc means data=original mean nway;
class product model date;
var sale;
ods output summarytable=want;
format date weekv5.;
run;