I am trying to find average amount by product_group, product, YM and then using that average for new YM.
for e.g
Until now I tried partition by and indexing but I think it doesn't work in sas.
Hoping that someone can point me in right direction on how to solve this problem.
data have; input YM Product_Group$ Product$ Amount; datalines; 200101 X P1 1.2 200201 X P1 8.9 200301 X P1 432.4 200101 Y P2 24.23 200201 Y P2 842.9 200301 Y P2 43.4 run;
I want output something like:
data want; input YM Product_Group$ Product$ Average; datalines; 200401 X P1 147.5 200401 Y P1 303.51 run;
@John04 wrote:
I am trying to find average amount by product_group, product, YM and then using that average for new YM.
for e.g
Until now I tried partition by and indexing but I think it doesn't work in sas.
Hoping that someone can point me in right direction on how to solve this problem.
data have; input YM Product_Group$ Product$ Amount; datalines; 200101 X P1 1.2 200201 X P1 8.9 200301 X P1 432.4 200101 Y P2 24.23 200201 Y P2 842.9 200301 Y P2 43.4 run;I want output something like:
data want; input YM Product_Group$ Product$ Average; datalines; 200401 X P1 147.5 200401 Y P1 303.51 run;
This is the easiest way I know to get the mean amount by product_group and Product
proc summary data=have nway; class product_group product; var amount; output out=want (drop=_:) mean=; run;
But getting a value that does not exist in the source data, ie that 200401, as part of any summary will require further processing, such as data step to add the value.
Large economy size hint: Instead of using random numbers containing parts of a date, create actual date values with appropriate formats such as:
data have; input YM :yymmn6. Product_Group$ Product$ Amount; format ym yymmn.; datalines; 200101 X P1 1.2 200201 X P1 8.9 200301 X P1 432.4 200101 Y P2 24.23 200201 Y P2 842.9 200301 Y P2 43.4 run;
Then you have all the power of the SAS functions such as INTNX to increment years, such as Intnx('year',ym,1) instead of weird arithmetic like 200101 +100 to simulate a next year. Plus graphs with dates will space tickmarks in nicer intervals. The above would assume the first day of a month for date.
Why do you need to create an average by product and YM? There is only 1 data point for each product and YM, so the average is just the value of that data point.
data want;
set have;
run;
Perhaps a more representative example data set would help here.
@John04 wrote:
Goal is to use the Average value as the "Inline Predicted value" for the next YM; incase it would be for 200401 for product X and Y.
I'm afraid this doesn't mean anything to me in the context of your original question. Please explain in more detail. And please provide a more representative data set.
Again, you haven't provided a meaningful example, you are computing means of ONE data point in each YM/Product combination, which is completely unnecessary, there's no programming involved, the mean of a single data point is that single data point. And then of course the desired output can't be computed from the shown input, as there is no 200401 in the input.
Please put some effort into creating meaningful and realistic examples of the input data, and output data that is based upon the input data.
Should you really be talking about a situation where you have multiple records for a YM/Product combination, this code will compute the means.
data have;
input YM Product_Group Product Amount;
datalines;
200101 X P1 1.2
200201 X P1 8.9
200301 X P1 432.4
200101 Y P2 24.23
200201 Y P2 842.9
200301 Y P2 43.4
run;
proc summary data=have nway;
class ym product;
var amount;
output out=want mean=;
Run;
First, please test your data step code. It generates missing values for Product and Product_group because you don't have provision for reading character variables.
I think you meant something more like:
data have; input YM Product_Group $ Product $ Amount; datalines; 200101 X P1 1.2 200101 Y P4 4.3 200102 X P2 2.3 200102 Y P5 4.6 200103 X P3 3.4 200103 Y P6 5.6 200201 X P1 8.9 200201 Y P4 35.6 200202 X P2 53.6 200202 Y P5 24.6 200203 X P3 346.7 200203 Y P6 23.5 200301 X P1 432.4 200301 Y P4 13.5 200302 X P2 1.25 200302 Y P5 32.6 200303 X P3 13.5 200303 Y P6 1236.1 ;
You don't say if you want a data set (for further manipulation) or report (people read these). This one way to get a report:
proc means data=have mean; class ym product; var amount; run;
Means will also create a data set with the OUTPUT statement.
Your example data is pretty boring for the question as your grouping variables don't show any duplicates so the "mean" is the value in the data.
Are you look for proc means?
proc means mean;
class YM Product;
var Amount;
run;
By the way, you input statement need to be modified:
input YM Product_Group$ Product$ Amount;
@John04 wrote:
I am trying to find average amount by product_group, product, YM and then using that average for new YM.
for e.g
Until now I tried partition by and indexing but I think it doesn't work in sas.
Hoping that someone can point me in right direction on how to solve this problem.
data have; input YM Product_Group$ Product$ Amount; datalines; 200101 X P1 1.2 200201 X P1 8.9 200301 X P1 432.4 200101 Y P2 24.23 200201 Y P2 842.9 200301 Y P2 43.4 run;I want output something like:
data want; input YM Product_Group$ Product$ Average; datalines; 200401 X P1 147.5 200401 Y P1 303.51 run;
This is the easiest way I know to get the mean amount by product_group and Product
proc summary data=have nway; class product_group product; var amount; output out=want (drop=_:) mean=; run;
But getting a value that does not exist in the source data, ie that 200401, as part of any summary will require further processing, such as data step to add the value.
Large economy size hint: Instead of using random numbers containing parts of a date, create actual date values with appropriate formats such as:
data have; input YM :yymmn6. Product_Group$ Product$ Amount; format ym yymmn.; datalines; 200101 X P1 1.2 200201 X P1 8.9 200301 X P1 432.4 200101 Y P2 24.23 200201 Y P2 842.9 200301 Y P2 43.4 run;
Then you have all the power of the SAS functions such as INTNX to increment years, such as Intnx('year',ym,1) instead of weird arithmetic like 200101 +100 to simulate a next year. Plus graphs with dates will space tickmarks in nicer intervals. The above would assume the first day of a month for date.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.