Help using Base SAS procedures

Taking average

Accepted Solution Solved
Reply
Contributor
Posts: 65
Accepted Solution

Taking average

Hello everyone,

My dataset has 40 variables, for simplicity I show 7 of  them below:

branchstatecityproductmodeldatesale

I need to find the average sale of each product and model at any date, which is in the weekly manner.

I tries the following code:

proc sql;

create table average as

select product, model, date, mean(sale) as sale_ave

from original

group by product, model, date;

quit;

However, I found that this code gives me multiple values as sale_ave for each group of product, model, date.

How can I fix this problem to have the real average value of the sale?

Thanks,


Accepted Solutions
Solution
‎07-17-2015 06:07 PM
Respected Advisor
Posts: 4,920

Re: Taking average

Try instead:

proc sql;

create table average as

select

     product,

     model,

     intnx("WEEK.2",date,0) as week format=weekv5.,

     mean(sale) as sale_ave

from original

group by product, model, calculated week;

quit;

WEEK.2 weeks start on Mondays, like the WEEKV. format

PG

PG

View solution in original post


All Replies
Valued Guide
Posts: 860

Re: Taking average

The code you have is correct for what you are describing.  Try giving an example of what you have as input and what you want as output.

Contributor
Posts: 65

Re: Taking average

Posted in reply to Steelers_In_DC

This is part of the output I got:

product modeldatesale_ave
MCD100MCD100-0608W010.12
MCD100MCD100-0608W010.88
MCD100MCD100-0608W01-0.08
MCD100MCD100-0608W020.22
MCD100MCD100-0608W020.45

As you see for the same amount of product, model, and date I have multiple value as average.

Valued Guide
Posts: 860

Re: Taking average

Can you provide some input as well?  How big is the dataset and how it sit inputted?  It's possible that there are spaces or other characters entered at some point.  Try a step like the 'prep' table below and see if this helps.

data have;

infile cards dsd;

informat date mmddyy10.;

format date mmddyy10.;

input product$ model$ date sales;

cards;

1,001,01/01/1990,300

1,001,01/01/1990,100

1,001,01/01/1990,100

1,001,01/02/1990,200

1,001,01/02/1990,300

1,001,01/02/1990,100

2,001,01/01/1990,222

2,001,01/01/1990,333

2,002,01/01/1990,555

2,002,01/01/1990,444

2,003,01/01/1990,999

2,003,01/01/1990,444

3,001,01/01/1990,222

3,002,01/01/1990,777

3,003,01/01/1990,300

3,004,01/01/1990,800

3,005,01/01/1990,500

3,006,01/01/1990,666

;

data prep;

set have;

product = strip(product);

model = strip(model);

date = strip(date);

run;

proc sql;

create table want as

select *,avg(sales) as average

from prep

group by product,model,date;

Contributor
Posts: 65

Re: Taking average

Posted in reply to Steelers_In_DC

The dataset is huge, more that 20m observations and 44 vars. Because of that I can't uploaded here. The variables' formats are character, except date',which is date format, and sale's, which is numeric.

Valued Guide
Posts: 860

Re: Taking average

Here's your solution, not ideal because of the size of the dataset but this will work.  As far as I know you can't do this in one step, you'll have to break it down like this.

data have;

infile cards dsd;

informat date mmddyy10.;

format date weekv5.;

input product$ model$ date sales;

cards;

1,001,01/01/1990,300

1,001,01/02/1990,100

1,001,01/03/1990,100

1,001,01/04/1990,200

1,001,01/05/1990,300

1,001,01/06/1990,100

2,001,01/05/1990,222

2,001,01/06/1990,333

2,002,01/07/1990,555

2,002,01/08/1990,444

2,003,01/09/1990,999

2,003,01/10/1990,444

3,001,01/05/1990,222

3,002,01/06/1990,777

3,003,01/06/1990,300

3,004,01/07/1990,800

3,005,01/08/1990,500

3,006,01/09/1990,666

;

data prep;

set have;

date_c = input(put(date,weekv5.),$5.);

run;

proc sql;

create table want as

select distinct product,model,date,sales,avg(sales) as average

from prep

group by product,model,date_c;

Contributor
Posts: 65

Re: Taking average

Posted in reply to Steelers_In_DC

the output of date_c = input(put(date,weekv5.),$5.); is character, but I need it to be numeric to do date functions. Do you know how I can make it?

Super User
Posts: 19,772

Re: Taking average

Keep both the variable you formatted and the created variable.

Or as I initially suggested, use a proc means or proc summary and avoid the extra step entirely.

proc means data=original mean nway;

class product model date;

var sale;

ods output summarytable=want;

run;

Respected Advisor
Posts: 3,156

Re: Taking average

Lost in translation. What you really want maybe the average on each date for ALL products and models instead of EACH product and models? otherwise, it hardly makes any sense to me.

Haikuo

Contributor
Posts: 65

Re: Taking average

no for each product. As I wrote in that code, I want the data to be grouped by product, model, and date. So I can have the average of sales for a specific product and model at a specific date.

Super User
Posts: 19,772

Re: Taking average

Proc SQL does not use formats when calculating GROUP BY. Either create a Week variable or use a proc such as proc means or summary.

You can see this by removing the format from the date variable in your output data set. It will show different days.

Contributor
Posts: 65

Re: Taking average

I tried to create a week variable. I tried these ways:

data new; set old;

format date weekv5.;

run;

Does the above code change the value of date into weekly intervals?

I also used this:

date_week=put(date, weekv5.);

but the output is in the character format but I need it to be numeric.

Solution
‎07-17-2015 06:07 PM
Respected Advisor
Posts: 4,920

Re: Taking average

Try instead:

proc sql;

create table average as

select

     product,

     model,

     intnx("WEEK.2",date,0) as week format=weekv5.,

     mean(sale) as sale_ave

from original

group by product, model, calculated week;

quit;

WEEK.2 weeks start on Mondays, like the WEEKV. format

PG

PG
Contributor
Posts: 65

Re: Taking average

It seems that with this code the value of 'calculated week' is still in date format but not week, right?

If so, do you know how I can make it to be weekly, since I need to group my observation and find the average in weekly intervals.

Thanks,

Super User
Posts: 19,772

Re: Taking average

Apply a week format to the date variable within the proc means.

proc means data=original mean nway;

class product model date;

var sale;

ods output summarytable=want;

format date weekv5.;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 643 views
  • 8 likes
  • 5 in conversation