BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
m1986MM
Obsidian | Level 7

Hello everyone,

My dataset has 40 variables, for simplicity I show 7 of  them below:

branchstatecityproductmodeldatesale

I need to find the average sale of each product and model at any date, which is in the weekly manner.

I tries the following code:

proc sql;

create table average as

select product, model, date, mean(sale) as sale_ave

from original

group by product, model, date;

quit;

However, I found that this code gives me multiple values as sale_ave for each group of product, model, date.

How can I fix this problem to have the real average value of the sale?

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Try instead:

proc sql;

create table average as

select

     product,

     model,

     intnx("WEEK.2",date,0) as week format=weekv5.,

     mean(sale) as sale_ave

from original

group by product, model, calculated week;

quit;

WEEK.2 weeks start on Mondays, like the WEEKV. format

PG

PG

View solution in original post

17 REPLIES 17
Steelers_In_DC
Barite | Level 11

The code you have is correct for what you are describing.  Try giving an example of what you have as input and what you want as output.

m1986MM
Obsidian | Level 7

This is part of the output I got:

product modeldatesale_ave
MCD100MCD100-0608W010.12
MCD100MCD100-0608W010.88
MCD100MCD100-0608W01-0.08
MCD100MCD100-0608W020.22
MCD100MCD100-0608W020.45

As you see for the same amount of product, model, and date I have multiple value as average.

Steelers_In_DC
Barite | Level 11

Can you provide some input as well?  How big is the dataset and how it sit inputted?  It's possible that there are spaces or other characters entered at some point.  Try a step like the 'prep' table below and see if this helps.

data have;

infile cards dsd;

informat date mmddyy10.;

format date mmddyy10.;

input product$ model$ date sales;

cards;

1,001,01/01/1990,300

1,001,01/01/1990,100

1,001,01/01/1990,100

1,001,01/02/1990,200

1,001,01/02/1990,300

1,001,01/02/1990,100

2,001,01/01/1990,222

2,001,01/01/1990,333

2,002,01/01/1990,555

2,002,01/01/1990,444

2,003,01/01/1990,999

2,003,01/01/1990,444

3,001,01/01/1990,222

3,002,01/01/1990,777

3,003,01/01/1990,300

3,004,01/01/1990,800

3,005,01/01/1990,500

3,006,01/01/1990,666

;

data prep;

set have;

product = strip(product);

model = strip(model);

date = strip(date);

run;

proc sql;

create table want as

select *,avg(sales) as average

from prep

group by product,model,date;

m1986MM
Obsidian | Level 7

The dataset is huge, more that 20m observations and 44 vars. Because of that I can't uploaded here. The variables' formats are character, except date',which is date format, and sale's, which is numeric.

Steelers_In_DC
Barite | Level 11

Here's your solution, not ideal because of the size of the dataset but this will work.  As far as I know you can't do this in one step, you'll have to break it down like this.

data have;

infile cards dsd;

informat date mmddyy10.;

format date weekv5.;

input product$ model$ date sales;

cards;

1,001,01/01/1990,300

1,001,01/02/1990,100

1,001,01/03/1990,100

1,001,01/04/1990,200

1,001,01/05/1990,300

1,001,01/06/1990,100

2,001,01/05/1990,222

2,001,01/06/1990,333

2,002,01/07/1990,555

2,002,01/08/1990,444

2,003,01/09/1990,999

2,003,01/10/1990,444

3,001,01/05/1990,222

3,002,01/06/1990,777

3,003,01/06/1990,300

3,004,01/07/1990,800

3,005,01/08/1990,500

3,006,01/09/1990,666

;

data prep;

set have;

date_c = input(put(date,weekv5.),$5.);

run;

proc sql;

create table want as

select distinct product,model,date,sales,avg(sales) as average

from prep

group by product,model,date_c;

m1986MM
Obsidian | Level 7

the output of date_c = input(put(date,weekv5.),$5.); is character, but I need it to be numeric to do date functions. Do you know how I can make it?

Reeza
Super User

Keep both the variable you formatted and the created variable.

Or as I initially suggested, use a proc means or proc summary and avoid the extra step entirely.

proc means data=original mean nway;

class product model date;

var sale;

ods output summarytable=want;

run;

Haikuo
Onyx | Level 15

Lost in translation. What you really want maybe the average on each date for ALL products and models instead of EACH product and models? otherwise, it hardly makes any sense to me.

Haikuo

m1986MM
Obsidian | Level 7

no for each product. As I wrote in that code, I want the data to be grouped by product, model, and date. So I can have the average of sales for a specific product and model at a specific date.

Reeza
Super User

Proc SQL does not use formats when calculating GROUP BY. Either create a Week variable or use a proc such as proc means or summary.

You can see this by removing the format from the date variable in your output data set. It will show different days.

m1986MM
Obsidian | Level 7

I tried to create a week variable. I tried these ways:

data new; set old;

format date weekv5.;

run;

Does the above code change the value of date into weekly intervals?

I also used this:

date_week=put(date, weekv5.);

but the output is in the character format but I need it to be numeric.

PGStats
Opal | Level 21

Try instead:

proc sql;

create table average as

select

     product,

     model,

     intnx("WEEK.2",date,0) as week format=weekv5.,

     mean(sale) as sale_ave

from original

group by product, model, calculated week;

quit;

WEEK.2 weeks start on Mondays, like the WEEKV. format

PG

PG
m1986MM
Obsidian | Level 7

It seems that with this code the value of 'calculated week' is still in date format but not week, right?

If so, do you know how I can make it to be weekly, since I need to group my observation and find the average in weekly intervals.

Thanks,

Reeza
Super User

Apply a week format to the date variable within the proc means.

proc means data=original mean nway;

class product model date;

var sale;

ods output summarytable=want;

format date weekv5.;

run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 1946 views
  • 8 likes
  • 5 in conversation