Hello everyone,
I have a huge dataset with more than 30m observations. The main structure of the dataset is similar to table below. However, the number of variables are 46 (not all of them useful).
| Company_Nm | size | Brach | product | date | cost |
| X | X1 | A | |||
| X | X2 | B | |||
| X | X3 | C | |||
| X | X4 | D | |||
| Y | Y1 | A | |||
| Y | Y2 | B | |||
| Y | Y3 | C | |||
| Y | Y4 | D | |||
| Y | Y5 | E | |||
| Z | Z1 | A |
In this dataset what I want to analyze is the cost of different product, offered by different branches of each company. The data for cost is available since 1995 in a weekly order (not similar day of the week for all of the observations).
Now what I want to do is taking the average cost for each product, offered by each branch, for every month. In this way I can both reduce the number on obs and analyze the price variable cross sectionally.
The date viable is in this format: YYYY-MM-DD. Because of the large number of variables, I prefer to not use CLASS statement. However, even if I have to use CLASS, how can I write the code to take the average for every month and create a dataset that has date (YYYY-MM) and Ave_price instead of the last two variables in the current data set.
I greatly appreciate any and all suggestions.
Below should work.
data have;
infile datalines truncover;
input Company_Id $ Branch_Id $ Prod_Id $;
format Date yymmdd10. cost dollar10.2;
Date='01Jan2012'd;
Date=intnx('week',date,0,'b');
do while (date<today());
cost=date/1000*_n_;
output;
Date=intnx('week',date,1,'b');
end;
datalines;
1 1 A
1 1 B
1 2 A
2 1 B
;
run;
proc sql;
create table want as
select
Company_Id
,Branch_Id
,Prod_Id
,intnx('month',Date,0,'b') as Month_Begin_Dt format=yymm7.
,avg(Cost) as avg_cost format=dollar10.2
from have
group by
Company_Id
,Branch_Id
,Prod_Id
,calculated Month_Begin_Dt
;
quit;
This is a untested code, however i believe it will work for you , could you please check with you data
proc sql;
create table want as select product,branch,month,ym,avg(cost) as avg_cost from (select *, month(date) as month, put(date,yymmd.) as ym from test) group by product,branch,month,ym;
run;
Thanks,
Jag
Below should work.
data have;
infile datalines truncover;
input Company_Id $ Branch_Id $ Prod_Id $;
format Date yymmdd10. cost dollar10.2;
Date='01Jan2012'd;
Date=intnx('week',date,0,'b');
do while (date<today());
cost=date/1000*_n_;
output;
Date=intnx('week',date,1,'b');
end;
datalines;
1 1 A
1 1 B
1 2 A
2 1 B
;
run;
proc sql;
create table want as
select
Company_Id
,Branch_Id
,Prod_Id
,intnx('month',Date,0,'b') as Month_Begin_Dt format=yymm7.
,avg(Cost) as avg_cost format=dollar10.2
from have
group by
Company_Id
,Branch_Id
,Prod_Id
,calculated Month_Begin_Dt
;
quit;
Thanks Patrick. Could you please explain below statements in the code?
1. Date='01Jan2012'd;
2. cost=date/1000*_n_;
Thanks.
That's just about mocking up some sample data. In 1) I need to start with some date and in 2) I just need to populate "cost" with some value.
Thanks.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.