Hello everyone, I have a huge dataset with more than 30m observations. The main structure of the dataset is similar to table below. However, the number of variables are 46 (not all of them useful). Company_Nm size Brach product date cost X X1 A X X2 B X X3 C X X4 D Y Y1 A Y Y2 B Y Y3 C Y Y4 D Y Y5 E Z Z1 A In this dataset what I want to analyze is the cost of different product, offered by different branches of each company. The data for cost is available since 1995 in a weekly order (not similar day of the week for all of the observations). Now what I want to do is taking the average cost for each product, offered by each branch, for every month. In this way I can both reduce the number on obs and analyze the price variable cross sectionally. The date viable is in this format: YYYY-MM-DD. Because of the large number of variables, I prefer to not use CLASS statement. However, even if I have to use CLASS, how can I write the code to take the average for every month and create a dataset that has date (YYYY-MM) and Ave_price instead of the last two variables in the current data set. I greatly appreciate any and all suggestions.
... View more